This module builds on code contained in Coronavirus_Statistics_USAF_v005.Rmd. This file includes the latest code for analyzing data from USA Facts. USA Facts maintains data on cases and deaths by county for coronavirus in the US. Downloaded data are unique by county with date as a column and a separate file for each of cases, deaths, and population.
The intent of this code is to source updated functions that allow for readRunUSAFacts() to be run to obtain, read, process, and analyze data from USA Facts.
The tidyverse library is loaded, and the functions used for CDC daily processing are sourced. Additionally, specific functions for USA Facts are also sourced:
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.1 v dplyr 1.0.6
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
# Functions are available in source file
source("./Generic_Added_Utility_Functions_202105_v001.R")
source("./Coronavirus_CDC_Daily_Functions_v001.R")
source("./Coronavirus_USAF_Functions_v001.R")
Further, the mapping file specific to USA Facts is sourced:
source("./Coronavirus_USAF_Default_Mappings_v001.R")
Updated functions for diagnoseClusters(), createDetailedSummaries(), createSummary(), and helperSummaryMap() are added to Coronavirus_USAF_Functions_v001.R. These functions should be checked for consistency with state-level data with just a single copy kept later.
The functions are tested on previously downloaded data, with results cached:
urlMapper[["usafCase"]] <- "https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv"
urlMapper[["usafDeath"]] <- "https://static.usafacts.org/public/data/covid-19/covid_deaths_usafacts.csv"
urlMapper[["usafPop"]] <- "https://static.usafacts.org/public/data/covid-19/covid_county_population_usafacts.csv"
readList <- list("usafCase"="./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20220308.csv",
"usafDeath"="./RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20220308.csv"
)
compareList <- list("usafCase"=readFromRDS("cty_newdata_20220202")$dfRaw$usafCase,
"usafDeath"=readFromRDS("cty_newdata_20220202")$dfRaw$usafDeath
)
# Use existing clusters
cty_chkdata_20220308 <- readRunUSAFacts(maxDate="2022-03-06",
downloadTo=lapply(readList,
FUN=function(x) if(file.exists(x)) NA else x
),
readFrom=readList,
compareFile=compareList,
writeLog="./RInputFiles/Coronavirus/USAF_NewData_20220308_chk_v005.log",
ovrwriteLog=TRUE,
useClusters=readFromRDS("cty_newdata_20210813")$useClusters,
skipAssessmentPlots=FALSE,
brewPalette="Paired"
)
##
## No file has been downloaded, will use existing file: ./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20220308.csv
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## `County Name` = col_character(),
## State = col_character(),
## StateFIPS = col_character()
## )
## i Use `spec()` for the full column specifications.
##
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS
##
##
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date
##
##
## Checking for similarity of: column names
## In reference but not in current:
## In current but not in reference:
##
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 32
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20220308_chk_v005.log
##
## Checking for similarity of: county
## In reference but not in current:
## In current but not in reference:
##
##
## ***Differences of at least 5 and at least 5%
##
## 6 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220308_chk_v005.log
##
##
## ***Differences of at least 0 and at least 0.1%
##
## 228 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220308_chk_v005.log
##
##
## No file has been downloaded, will use existing file: ./RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20220308.csv
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## `County Name` = col_character(),
## State = col_character(),
## StateFIPS = col_character()
## )
## i Use `spec()` for the full column specifications.
##
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS
##
##
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date
##
##
## Checking for similarity of: column names
## In reference but not in current:
## In current but not in reference:
##
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 32
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20220308_chk_v005.log
##
## Checking for similarity of: county
## In reference but not in current:
## In current but not in reference:
##
##
## ***Differences of at least 5 and at least 5%
##
## 44 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220308_chk_v005.log
##
##
## ***Differences of at least 0 and at least 0.1%
##
## 253 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220308_chk_v005.log
##
##
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 4
## isType cases new_cases n
## <chr> <dbl> <dbl> <dbl>
## 1 before 1.92e+10 7.73e+7 2468189
## 2 after 1.90e+10 7.69e+7 2428766
## 3 pctchg 5.39e- 3 5.14e-3 0.0160
##
##
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 4
## isType deaths new_deaths n
## <chr> <dbl> <dbl> <dbl>
## 1 before 3.27e+8 951208 2468189
## 2 after 3.20e+8 907057 2428766
## 3 pctchg 2.11e-2 0.0464 0.0160
## NULL
# Plot all counties based on closest cluster
sparseCountyClusterMap(cty_chkdata_20220308$useClusters,
caption="Includes only counties with 25k+ population",
brewPalette="viridis"
)
# Save the check file
saveToRDS(cty_chkdata_20220308, ovrWriteError=FALSE)
##
## File already exists: ./RInputFiles/Coronavirus/cty_chkdata_20220308.RDS
##
## Not replacing the existing file since ovrWrite=FALSE
## NULL
# Confirm that it is identical to the previous process
cty_newdata_20220308 <- readFromRDS("cty_newdata_20220308")
# Same names in the list
all.equal(names(cty_chkdata_20220308), names(cty_newdata_20220308))
## [1] TRUE
# Identical items in the list
sapply(names(cty_chkdata_20220308),
FUN=function(x) identical(cty_chkdata_20220308[[x]], cty_newdata_20220308[[x]])
)
## countyData dfRaw dfProcess dfPerCapita useClusters maxDate
## TRUE TRUE TRUE TRUE TRUE TRUE
## plotDataList
## FALSE
# ggplot2 objects are never identical due to environment; confirm they are all.equal
all.equal(cty_chkdata_20220308$plotDataList, cty_newdata_20220308$plotDataList)
## [1] TRUE
The capability for obtaining and processing county-level vaccines data is included:
# Read the relevant vaccines data
vaxPartialRaw_20220309 <- downloadCountyVaccines(loc="./RInputFiles/Coronavirus/county_vaccine_20220309.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## Date = col_character(),
## FIPS = col_character(),
## Recip_County = col_character(),
## Recip_State = col_character(),
## SVI_CTGY = col_character(),
## Metro_status = col_character()
## )
## i Use `spec()` for the full column specifications.
##
## Records from other than 50 states and DC:
## # A tibble: 9 x 2
## state n
## <chr> <int>
## 1 AS 442
## 2 FM 447
## 3 GU 897
## 4 MH 434
## 5 MP 444
## 6 PR 35625
## 7 PW 441
## 8 UNK 308
## 9 VI 1794
vaxPartialRaw_20220309
## # A tibble: 1,480,110 x 10
## date FIPS countyName state vxcpoppct vxcgte18pct vxcgte65pct pop
## <date> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2022-03-08 06101 Sutter County CA 60.3 73 87.7 96971
## 2 2022-03-08 12009 Brevard Coun~ FL 64 72.9 88.4 601942
## 3 2022-03-08 19085 Harrison Cou~ IA 51.6 62.1 88.9 14049
## 4 2022-03-08 06069 San Benito C~ CA 66 75.6 83.3 62808
## 5 2022-03-08 19057 Des Moines C~ IA 51.1 61.9 83.4 38967
## 6 2022-03-08 05073 Lafayette Co~ AR 41.2 48.5 57.4 6624
## 7 2022-03-08 16021 Boundary Cou~ ID 34.8 42.8 63.6 12245
## 8 2022-03-08 20073 Greenwood Co~ KS 49 59.6 80.6 5982
## 9 2022-03-08 33005 Cheshire Cou~ NH 61.6 69.1 89.6 76085
## 10 2022-03-08 37159 Rowan County NC 43.8 52.5 74.1 142088
## # ... with 1,480,100 more rows, and 2 more variables: popgte18 <dbl>,
## # popgte65 <dbl>
# Repair the data for 65+
vaxFix65_20220309 <- repairVaxPopulation(vaxPartialRaw_20220309, colsRepair=c("popgte65"))
##
## Count of NA records by column
## state FIPS popgte65_minpop popgte65_maxpop popgte65_nnA
## 0 0 0 0 0
## n
## 0
##
## Records where minimum and maximum population differ# A tibble: 0 x 5
## # ... with 5 variables: state <chr>, FIPS <chr>, age <chr>, minpop <dbl>,
## # maxpop <dbl>
vaxFix65_20220309
## # A tibble: 1,417,042 x 10
## date FIPS countyName state vxcpoppct vxcgte18pct vxcgte65pct pop
## <date> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2022-03-08 06101 Sutter County CA 60.3 73 87.7 96971
## 2 2022-03-08 12009 Brevard Coun~ FL 64 72.9 88.4 601942
## 3 2022-03-08 19085 Harrison Cou~ IA 51.6 62.1 88.9 14049
## 4 2022-03-08 06069 San Benito C~ CA 66 75.6 83.3 62808
## 5 2022-03-08 19057 Des Moines C~ IA 51.1 61.9 83.4 38967
## 6 2022-03-08 05073 Lafayette Co~ AR 41.2 48.5 57.4 6624
## 7 2022-03-08 16021 Boundary Cou~ ID 34.8 42.8 63.6 12245
## 8 2022-03-08 20073 Greenwood Co~ KS 49 59.6 80.6 5982
## 9 2022-03-08 33005 Cheshire Cou~ NH 61.6 69.1 89.6 76085
## 10 2022-03-08 37159 Rowan County NC 43.8 52.5 74.1 142088
## # ... with 1,417,032 more rows, and 2 more variables: popgte18 <dbl>,
## # popgte65 <dbl>
Correlations data can also be run:
corrList20220309 <- corrVaxBurden(lstCD=cty_newdata_20220308,
dfVax=vaxPartialRaw_20220309,
minDateCD=c("2021-11-01", "2021-09-01"),
maxDateCD="2022-02-28"
)
##
## Will run with parameters:
## burdenVar: cpm dpm
## vaxVar: vxcpoppct vxcpoppct
## minDateCD: 2021-11-01 2021-09-01
## maxDateCD: 2022-02-28 2022-02-28
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
##
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -57553717 -2611410 -275030 2650218 123342871
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 68194.65 1887.85 36.12 <2e-16 ***
## vaxMetric 530.27 33.93 15.63 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7406000 on 3140 degrees of freedom
## Multiple R-squared: 0.07217, Adjusted R-squared: 0.07187
## F-statistic: 244.2 on 1 and 3140 DF, p-value: < 2.2e-16
##
##
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric,
## data = dfReg, weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -57474876 -2595922 -291957 2584956 122543573
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## type<25k 58253.65 6207.86 9.384 < 2e-16 ***
## type>500k 66890.28 3631.97 18.417 < 2e-16 ***
## type100k-500k 72662.44 3757.76 19.337 < 2e-16 ***
## type25k-100k 65791.22 4409.38 14.921 < 2e-16 ***
## vaxMetric:type<25k 731.64 145.96 5.013 5.67e-07 ***
## vaxMetric:type>500k 553.88 60.14 9.210 < 2e-16 ***
## vaxMetric:type100k-500k 429.71 69.68 6.167 7.84e-10 ***
## vaxMetric:type25k-100k 628.19 96.06 6.540 7.17e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7400000 on 3134 degrees of freedom
## Multiple R-squared: 0.9476, Adjusted R-squared: 0.9474
## F-statistic: 7081 on 8 and 3134 DF, p-value: < 2.2e-16
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
##
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -1371927 -23067 44651 121402 729890
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1254.2057 22.8836 54.81 <2e-16 ***
## vaxMetric -9.5890 0.4779 -20.07 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 160100 on 3140 degrees of freedom
## Multiple R-squared: 0.1136, Adjusted R-squared: 0.1134
## F-statistic: 402.6 on 1 and 3140 DF, p-value: < 2.2e-16
##
##
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric,
## data = dfReg, weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -885434 -65276 -4779 67115 1060610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## type<25k 1760.1655 74.0739 23.762 < 2e-16 ***
## type>500k 758.0391 29.0606 26.085 < 2e-16 ***
## type100k-500k 1304.8595 41.8238 31.199 < 2e-16 ***
## type25k-100k 1668.2572 50.2491 33.200 < 2e-16 ***
## vaxMetric:type<25k -11.1910 2.0925 -5.348 9.52e-08 ***
## vaxMetric:type>500k -3.2769 0.5600 -5.851 5.38e-09 ***
## vaxMetric:type100k-500k -9.4838 0.8902 -10.653 < 2e-16 ***
## vaxMetric:type25k-100k -11.0196 1.2902 -8.541 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 140000 on 3134 degrees of freedom
## Multiple R-squared: 0.8063, Adjusted R-squared: 0.8058
## F-statistic: 1631 on 8 and 3134 DF, p-value: < 2.2e-16
corrList20220309
## [[1]]
## # A tibble: 3,142 x 10
## fips countyName pop state vxcpoppct vxcgte18pct vxcgte65pct cpm dpm
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 42129 Westmorela~ 348899 PA 54.2 63.2 81.2 96676. 1238.
## 2 39109 Miami Coun~ 106987 OH 42.8 53 77.6 91619. 1318.
## 3 16045 Gem County 18112 ID 34.5 44.7 71 44611. 939.
## 4 05071 Johnson Co~ 26578 AR 44.4 54.8 67.7 74761. 452.
## 5 48411 San Saba C~ 6055 TX 32.7 40 54.4 38811. 991.
## 6 51710 Norfolk ci~ 242742 VA 48.2 56.7 79.4 69044. 470.
## 7 27007 Beltrami C~ 47188 MN 52.5 64.7 91.4 94982. 805.
## 8 05067 Jackson Co~ 16719 AR 34.4 41.2 66.9 98870. 658.
## 9 51735 Poquoson c~ 12271 VA 43.5 52.8 69.7 79945. 652.
## 10 47047 Fayette Co~ 41133 TN 49.2 57.9 76.3 97610. 1386.
## # ... with 3,132 more rows, and 1 more variable: type <chr>
##
## [[2]]
## # A tibble: 3,142 x 10
## fips countyName pop state vxcpoppct vxcgte18pct vxcgte65pct cpm dpm
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 51710 Norfolk ci~ 242742 VA 39.4 46.7 68.7 8.43e4 704.
## 2 01127 Walker Cou~ 63521 AL 32.7 41 68.1 1.46e5 1810.
## 3 28153 Wayne Coun~ 20183 MS 27.1 34.7 58.2 1.06e5 1288.
## 4 26045 Eaton Coun~ 110268 MI 47.4 56.5 77.4 1.41e5 1533.
## 5 48313 Madison Co~ 14284 TX 0 0 0 1.00e5 1050.
## 6 39095 Lucas Coun~ 428348 OH 39.2 48.2 67.2 1.22e5 1158.
## 7 29153 Ozark Coun~ 9174 MO 24.7 30 41 7.60e4 3052.
## 8 29071 Franklin C~ 103967 MO 44.5 55 81.4 9.62e4 1212.
## 9 21149 McLean Cou~ 9207 KY 40.9 51.6 76.7 1.56e5 1629.
## 10 13233 Polk County 42613 GA 9.4 11.7 9.8 9.84e4 1877.
## # ... with 3,132 more rows, and 1 more variable: type <chr>
Comparisons can be run between summed county and state data:
statePerCapita <- readFromRDS("cdc_daily_220304")$dfPerCapita
tempStateCompareList <- compareStateSummedCounty(dfState=statePerCapita,
dfCounty=cty_newdata_20220308$dfPerCapita,
aggData=TRUE,
dateThru="2022-02-28",
returnData=TRUE
)
## Warning: Removed 6 row(s) containing missing values (geom_path).
tempStateCompareList
## $dfState
## # A tibble: 3,160 x 6
## date name src value value7 state
## <date> <chr> <chr> <dbl> <dbl> <chr>
## 1 2020-01-01 cases state 0 NA Aggregated
## 2 2020-01-01 deaths state 0 NA Aggregated
## 3 2020-01-01 new_cases state 0 NA Aggregated
## 4 2020-01-01 new_deaths state 0 NA Aggregated
## 5 2020-01-02 cases state 0 NA Aggregated
## 6 2020-01-02 deaths state 0 NA Aggregated
## 7 2020-01-02 new_cases state 0 NA Aggregated
## 8 2020-01-02 new_deaths state 0 NA Aggregated
## 9 2020-01-03 cases state 0 NA Aggregated
## 10 2020-01-03 deaths state 0 NA Aggregated
## # ... with 3,150 more rows
##
## $dfCounty
## # A tibble: 3,052 x 6
## date name src value value7 state
## <date> <chr> <chr> <dbl> <dbl> <chr>
## 1 2020-01-25 cases county 751 750. Aggregated
## 2 2020-01-25 deaths county 1 1 Aggregated
## 3 2020-01-25 new_cases county 10 111. Aggregated
## 4 2020-01-25 new_deaths county 0 0.143 Aggregated
## 5 2020-01-26 cases county 759 758. Aggregated
## 6 2020-01-26 deaths county 1 1 Aggregated
## 7 2020-01-26 new_cases county 8 8 Aggregated
## 8 2020-01-26 new_deaths county 0 0 Aggregated
## 9 2020-01-27 cases county 769 766. Aggregated
## 10 2020-01-27 deaths county 1 1 Aggregated
## # ... with 3,042 more rows
The scoring metric is converted to functional form:
# Data for score similarity process
tempStateCompareList_v2 <- compareStateSummedCounty(dfState=statePerCapita,
dfCounty=cty_newdata_20220308$dfPerCapita,
inclStates=c(state.abb, "DC"),
dateThru="2022-02-28",
makePlot=FALSE,
returnData=TRUE
)
scoreSimilarity(tempStateCompareList_v2, minDate="2020-02-15", maxDate="2022-02-15", makeFacet=FALSE)
# Example states with meaningful disconnects on 1+ metrics
compareStateSummedCounty(dfState=statePerCapita,
dfCounty=cty_newdata_20220308$dfPerCapita,
inclStates=c("FL", "MO", "OK", "TX", "ME", "NE", "KY", "AL", "GA"),
dateThru="2022-02-28",
makePlot=TRUE,
returnData=FALSE
)
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
While cumulative deaths and cumulative cases are generally well aligned between sources, rolling 7-day deaths and cases are frequently divergent by source.
An integrated vaccines dataset can be created:
allState_20220309 <- integrateStateVaccine(vaxFix65_20220309, statePerCap=statePerCapita)
allState_20220309
## # A tibble: 23,001 x 11
## state date vxcpoppct vxcgte18pct vxcgte65pct ctypoppct ctygte18pct
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AK 2020-12-13 NA NA NA 0 0
## 2 AL 2020-12-13 NA NA NA 0 0
## 3 AR 2020-12-13 NA NA NA 0 0
## 4 AZ 2020-12-13 NA NA NA 0 0
## 5 CA 2020-12-13 NA NA NA 0 0
## 6 CO 2020-12-13 NA NA NA 0 0
## 7 CT 2020-12-13 NA NA NA 0 0
## 8 DC 2020-12-13 NA NA NA 0 0
## 9 DE 2020-12-13 NA NA NA 0 0
## 10 FL 2020-12-13 NA NA NA 0 0
## # ... with 22,991 more rows, and 4 more variables: ctygte65pct <dbl>,
## # pop <dbl>, popgte18 <dbl>, popgte65 <dbl>
Functionality for exploring vaccine evolution is included:
# Example for a single state
stateAgeVaxEvolution(allState_20220309, keyState="FL", minDate="2020-12-15", returnData=TRUE)
## Warning: Removed 5 row(s) containing missing values (geom_path).
## # A tibble: 2,694 x 6
## state date name value src age
## <chr> <date> <chr> <dbl> <chr> <chr>
## 1 FL 2020-12-15 vxcpoppct 0 State All
## 2 FL 2020-12-15 vxcgte18pct 0 State 18+
## 3 FL 2020-12-15 vxcgte65pct 0 State 65+
## 4 FL 2020-12-15 ctypoppct 0 Summed County All
## 5 FL 2020-12-15 ctygte18pct 0 Summed County 18+
## 6 FL 2020-12-15 ctygte65pct 0 Summed County 65+
## 7 FL 2020-12-16 vxcpoppct 0 State All
## 8 FL 2020-12-16 vxcgte18pct 0 State 18+
## 9 FL 2020-12-16 vxcgte65pct 0 State 65+
## 10 FL 2020-12-16 ctypoppct 0 Summed County All
## # ... with 2,684 more rows
# Example for multiple states without plotting
stateAgeVaxEvolution(allState_20220309, keyState=c("CA", "FL", "TX", "NY", "PA", "IL"), createPlot=FALSE)
## # A tibble: 16,236 x 6
## state date name value src age
## <chr> <date> <chr> <dbl> <chr> <chr>
## 1 CA 2020-12-13 vxcpoppct NA State All
## 2 CA 2020-12-13 vxcgte18pct NA State 18+
## 3 CA 2020-12-13 vxcgte65pct NA State 65+
## 4 CA 2020-12-13 ctypoppct 0 Summed County All
## 5 CA 2020-12-13 ctygte18pct 0 Summed County 18+
## 6 CA 2020-12-13 ctygte65pct 0 Summed County 65+
## 7 FL 2020-12-13 vxcpoppct NA State All
## 8 FL 2020-12-13 vxcgte18pct NA State 18+
## 9 FL 2020-12-13 vxcgte65pct NA State 65+
## 10 FL 2020-12-13 ctypoppct 0 Summed County All
## # ... with 16,226 more rows
# Example for multiple states with plotting
stateAgeVaxEvolution(allState_20220309, keyState=c("CT", "AR", "AZ"), minDate="2020-12-15", returnData=TRUE)
## Warning: Removed 5 row(s) containing missing values (geom_path).
## # A tibble: 8,082 x 6
## state date name value src age
## <chr> <date> <chr> <dbl> <chr> <chr>
## 1 AR 2020-12-15 vxcpoppct 0 State All
## 2 AR 2020-12-15 vxcgte18pct 0 State 18+
## 3 AR 2020-12-15 vxcgte65pct 0 State 65+
## 4 AR 2020-12-15 ctypoppct 0 Summed County All
## 5 AR 2020-12-15 ctygte18pct 0 Summed County 18+
## 6 AR 2020-12-15 ctygte65pct 0 Summed County 65+
## 7 AZ 2020-12-15 vxcpoppct 0 State All
## 8 AZ 2020-12-15 vxcgte18pct 0 State 18+
## 9 AZ 2020-12-15 vxcgte65pct 0 State 65+
## 10 AZ 2020-12-15 ctypoppct 0 Summed County All
## # ... with 8,072 more rows
Scores can be created for every state, reflecting differences in the vaccination data:
scoreVaxSimilarity(allState_20220309)
scoreVaxSimilarity(allState_20220309, minDate="2021-12-01", maxDate="2022-02-28", returnBaseData=TRUE)
## # A tibble: 459 x 7
## state ym age n rmse cdcState ctySum
## <chr> <chr> <chr> <int> <dbl> <dbl> <dbl>
## 1 AK 2021-12 18+ 31 2.30 67.2 65.0
## 2 AK 2021-12 65+ 31 2.25 83.9 81.7
## 3 AK 2021-12 All 31 1.84 55.6 53.7
## 4 AK 2022-01 18+ 31 2.48 68.6 66.2
## 5 AK 2022-01 65+ 31 2.35 84.4 82.1
## 6 AK 2022-01 All 31 1.98 57.3 55.4
## 7 AK 2022-02 18+ 28 1.88 71.4 69.6
## 8 AK 2022-02 65+ 28 1.86 85.0 83.2
## 9 AK 2022-02 All 28 1.48 60.2 58.7
## 10 AL 2021-12 18+ 31 4.26 57.2 53.0
## # ... with 449 more rows
stateAgeVaxEvolution(allState_20220309,
keyState=c("HI", "TX", "VA", "GA", "CO", "WV", "VT"),
createPlot = TRUE
)
## Warning: Removed 6 row(s) containing missing values (geom_path).
County-level burden process mapping is included:
dfRoll20220308 <- createBurdenCountyDate(cty_newdata_20220308,
maxDate="2022-02-28",
rollBy=months(c(0, -3, -6, -9)),
dateSpan=91
)
dfRoll20220308
## # A tibble: 12,568 x 7
## countyFIPS state asofDate tdpm tcpm dpm91 cpm91
## <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl>
## 1 01001 AL 2022-02-28 3472. 277614. 662. 89370.
## 2 01003 AL 2022-02-28 2867. 246280. 228. 75674.
## 3 01005 AL 2022-02-28 3767. 220570. 527. 70890.
## 4 01007 AL 2022-02-28 4421. 284674. 223. 90873.
## 5 01009 AL 2022-02-28 3770. 255249. 450. 69917.
## 6 01011 AL 2022-02-28 4851. 225621. 396. 74547.
## 7 01013 AL 2022-02-28 6119. 258896. 977. 82271.
## 8 01015 AL 2022-02-28 5176. 278491. 616. 79680.
## 9 01017 AL 2022-02-28 4571. 253533. 301. 79599.
## 10 01019 AL 2022-02-28 2978. 193694. 573. 72377.
## # ... with 12,558 more rows
makeBurdenDatePlot(dfRoll20220308, keyVar="cpm91", timeLabel="3-month", varCeiling=100000, varDivBy=1000)
makeBurdenDatePlot(dfRoll20220308, keyVar="dpm91", timeLabel="3-month", varCeiling=1500)
The latest county-level burden data are downloaded:
urlMapper[["usafCase"]] <- "https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv"
urlMapper[["usafDeath"]] <- "https://static.usafacts.org/public/data/covid-19/covid_deaths_usafacts.csv"
urlMapper[["usafPop"]] <- "https://static.usafacts.org/public/data/covid-19/covid_county_population_usafacts.csv"
readList <- list("usafCase"="./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20220414.csv",
"usafDeath"="./RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20220414.csv"
)
compareList <- list("usafCase"=readFromRDS("cty_newdata_20220308")$dfRaw$usafCase,
"usafDeath"=readFromRDS("cty_newdata_20220308")$dfRaw$usafDeath
)
# Use existing clusters
cty_newdata_20220414 <- readRunUSAFacts(maxDate="2022-04-12",
downloadTo=lapply(readList,
FUN=function(x) if(file.exists(x)) NA else x
),
readFrom=readList,
compareFile=compareList,
writeLog="./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log",
ovrwriteLog=TRUE,
useClusters=readFromRDS("cty_newdata_20210813")$useClusters,
skipAssessmentPlots=FALSE,
brewPalette="Paired"
)
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## `County Name` = col_character(),
## State = col_character(),
## StateFIPS = col_character()
## )
## i Use `spec()` for the full column specifications.
##
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS
##
##
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date
##
##
## Checking for similarity of: column names
## In reference but not in current:
## In current but not in reference:
##
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 38
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
##
## Checking for similarity of: county
## In reference but not in current:
## In current but not in reference:
##
##
## ***Differences of at least 5 and at least 5%
##
## 1 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
##
##
## ***Differences of at least 0 and at least 0.1%
##
## 122 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## `County Name` = col_character(),
## State = col_character(),
## StateFIPS = col_character()
## )
## i Use `spec()` for the full column specifications.
##
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS
##
##
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date
##
##
## Checking for similarity of: column names
## In reference but not in current:
## In current but not in reference:
##
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 38
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
##
## Checking for similarity of: county
## In reference but not in current:
## In current but not in reference:
##
##
## ***Differences of at least 5 and at least 5%
##
## 1 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
##
##
## ***Differences of at least 0 and at least 0.1%
##
## 118 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
##
##
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 4
## isType cases new_cases n
## <chr> <dbl> <dbl> <dbl>
## 1 before 2.21e+10 7.84e+7 2589523
## 2 after 2.20e+10 7.80e+7 2548162
## 3 pctchg 5.36e- 3 5.01e-3 0.0160
##
##
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 4
## isType deaths new_deaths n
## <chr> <dbl> <dbl> <dbl>
## 1 before 3.64e+8 978846 2589523
## 2 after 3.55e+8 930660 2548162
## 3 pctchg 2.39e-2 0.0492 0.0160
## NULL
# Plot all counties based on closest cluster
sparseCountyClusterMap(cty_newdata_20220414$useClusters,
caption="Includes only counties with 25k+ population",
brewPalette="viridis"
)
# Save the check file
saveToRDS(cty_newdata_20220414, ovrWriteError=FALSE)
A function is included for reading and fixing vaccines data, as well as running correlations to the burden data:
processCountyVaccines <- function(loc,
ctyList,
url="https://data.cdc.gov/api/views/8xkx-amqh/rows.csv?accessType=DOWNLOAD",
colsRepair=c("popgte65"),
...
) {
# FUNCTION ARGUMENTS:
# loc: the location of the downloaded vaccines data
# ctyList: processed list file of county-level burden data
# url: the location for obtaining the vaccines data
# colsRepair: columns in the raw vaccines data that require repairs to the population data
# ...: arguments passed to corrVaxBurden()
vaxRaw <- downloadCountyVaccines(loc=loc, url=url)
vaxFix <- repairVaxPopulation(vaxRaw, colsRepair=colsRepair)
corrList <- corrVaxBurden(lstCD=ctyList, dfVax=vaxRaw, ...)
# Return the key items
list(vaxRaw=vaxRaw, vaxFix=vaxFix, corrList=corrList)
}
cty_vaxdata_20220415 <- processCountyVaccines(loc="./RInputFiles/Coronavirus/county_vaccine_20220415.csv",
ctyList=cty_newdata_20220414,
minDateCD=c("2021-11-01", "2021-09-01"),
maxDateCD="2022-03-31"
)
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## Date = col_character(),
## FIPS = col_character(),
## Recip_County = col_character(),
## Recip_State = col_character(),
## SVI_CTGY = col_character(),
## Metro_status = col_character()
## )
## i Use `spec()` for the full column specifications.
##
## Records from other than 50 states and DC:
## # A tibble: 10 x 2
## state n
## <chr> <int>
## 1 AS 479
## 2 FM 484
## 3 GU 971
## 4 MH 471
## 5 MP 481
## 6 PR 38548
## 7 PW 478
## 8 UNK 308
## 9 VI 1942
## 10 <NA> 11
## Warning: Removed 16 rows containing non-finite values (stat_boxplot).
## Warning: Removed 16 rows containing non-finite values (stat_boxplot).
## Warning: Removed 16 rows containing non-finite values (stat_boxplot).
##
## Count of NA records by column
## state FIPS popgte65_minpop popgte65_maxpop popgte65_nnA
## 0 0 0 0 0
## n
## 0
##
## Records where minimum and maximum population differ# A tibble: 0 x 5
## # ... with 5 variables: state <chr>, FIPS <chr>, age <chr>, minpop <dbl>,
## # maxpop <dbl>
##
##
##
## Will run with parameters:
## burdenVar: cpm dpm
## vaxVar: vxcpoppct vxcpoppct
## minDateCD: 2021-11-01 2021-09-01
## maxDateCD: 2022-03-31 2022-03-31
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_point).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_point).
##
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -60027619 -2527753 -206065 2924501 119721627
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 66910.18 2015.81 33.19 <2e-16 ***
## vaxMetric 609.76 36.14 16.87 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7481000 on 3132 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.08332, Adjusted R-squared: 0.08303
## F-statistic: 284.7 on 1 and 3132 DF, p-value: < 2.2e-16
##
##
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric,
## data = dfReg, weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -57742661 -2581953 -363789 2682004 116556752
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## type<25k 59153.51 6320.38 9.359 < 2e-16 ***
## type>500k 57669.68 4189.22 13.766 < 2e-16 ***
## type100k-500k 71563.60 4088.55 17.503 < 2e-16 ***
## type25k-100k 66216.82 4507.32 14.691 < 2e-16 ***
## vaxMetric:type<25k 796.58 148.48 5.365 8.70e-08 ***
## vaxMetric:type>500k 757.00 69.15 10.947 < 2e-16 ***
## vaxMetric:type100k-500k 507.44 75.59 6.713 2.25e-11 ***
## vaxMetric:type25k-100k 691.21 98.12 7.045 2.28e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7467000 on 3126 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.9498, Adjusted R-squared: 0.9496
## F-statistic: 7388 on 8 and 3126 DF, p-value: < 2.2e-16
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_point).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_point).
##
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -1659513 -23948 48535 133724 882883
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1426.1342 25.7429 55.4 <2e-16 ***
## vaxMetric -11.6387 0.5363 -21.7 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 177100 on 3132 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.1307, Adjusted R-squared: 0.1305
## F-statistic: 471 on 1 and 3132 DF, p-value: < 2.2e-16
##
##
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric,
## data = dfReg, weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -928924 -70829 -5885 74620 1385207
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## type<25k 1963.9575 82.0871 23.925 < 2e-16 ***
## type>500k 865.1035 32.7854 26.387 < 2e-16 ***
## type100k-500k 1475.9119 47.3444 31.174 < 2e-16 ***
## type25k-100k 1865.0472 55.7475 33.455 < 2e-16 ***
## vaxMetric:type<25k -12.4743 2.3170 -5.384 7.83e-08 ***
## vaxMetric:type>500k -4.4739 0.6299 -7.102 1.51e-12 ***
## vaxMetric:type100k-500k -11.4038 1.0048 -11.350 < 2e-16 ***
## vaxMetric:type25k-100k -12.6591 1.4304 -8.850 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 154600 on 3126 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.8049, Adjusted R-squared: 0.8044
## F-statistic: 1612 on 8 and 3126 DF, p-value: < 2.2e-16
Similarities between summed county-level data and state-level data are explored:
# Data for score similarity process
tempStateCompareList_v2 <- compareStateSummedCounty(dfState=readFromRDS("cdc_daily_220416")$dfPerCapita,
dfCounty=cty_newdata_20220414$dfPerCapita,
inclStates=c(state.abb, "DC"),
dateThru="2022-04-10",
makePlot=FALSE,
returnData=TRUE
)
scoreSimilarity(tempStateCompareList_v2, minDate="2020-02-15", maxDate="2022-04-10", makeFacet=FALSE)
# Example states with meaningful disconnects on 1+ metrics
compareStateSummedCounty(dfState=readFromRDS("cdc_daily_220416")$dfPerCapita,
dfCounty=cty_newdata_20220414$dfPerCapita,
inclStates=c("GA", "FL", "NE", "OK", "KY", "ME", "VT", "WY", "TN", "RI", "MO", "MA", "KS"),
dateThru="2022-04-10",
makePlot=TRUE,
returnData=FALSE
)
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
An integrated vaccines dataset is created, along with similarity scores:
allState_20220415 <- integrateStateVaccine(cty_vaxdata_20220415$vaxFix,
statePerCap=readFromRDS("cdc_daily_220416")$dfPerCapita
)
allState_20220415
## # A tibble: 24,888 x 11
## state date vxcpoppct vxcgte18pct vxcgte65pct ctypoppct ctygte18pct
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AK 2020-12-13 NA NA NA 0 0
## 2 AL 2020-12-13 NA NA NA 0 0
## 3 AR 2020-12-13 NA NA NA 0 0
## 4 AZ 2020-12-13 NA NA NA 0 0
## 5 CA 2020-12-13 NA NA NA 0 0
## 6 CO 2020-12-13 NA NA NA 0 0
## 7 CT 2020-12-13 NA NA NA 0 0
## 8 DC 2020-12-13 NA NA NA 0 0
## 9 DE 2020-12-13 NA NA NA 0 0
## 10 FL 2020-12-13 NA NA NA 0 0
## # ... with 24,878 more rows, and 4 more variables: ctygte65pct <dbl>,
## # pop <dbl>, popgte18 <dbl>, popgte65 <dbl>
vaxDiff_20220415 <- scoreVaxSimilarity(allState_20220415,
minDate="2021-04-01",
maxDate="2022-03-31",
returnBaseData=TRUE
)
Plots of the larger differences are included:
stateAgeVaxEvolution(allState_20220415,
keyState=c("MA", "HI", "TX", "VA", "VT", "GA", "CO", "WV"),
createPlot = TRUE
)
## Warning: Removed 1 row(s) containing missing values (geom_path).
scoreVaxSimilarity(allState_20220415, minDate="2022-01-01", maxDate="2022-03-31")
stateAgeVaxEvolution(allState_20220415, keyState=c(state.abb, "DC"), createPlot = FALSE) %>%
filter(!complete.cases(.), date >= "2021-01-01") %>%
group_by(state, ym=customYYYYMM(date)) %>%
summarize(n=n()) %>%
pivot_wider(ym, names_from="state", values_from="n") %>%
arrange(desc(ym))
## `summarise()` has grouped output by 'state'. You can override using the `.groups` argument.
## # A tibble: 16 x 6
## ym CA HI MA TX VA
## <chr> <int> <int> <int> <int> <int>
## 1 2022-04 42 42 42 NA NA
## 2 2022-03 48 93 93 24 24
## 3 2022-02 NA 84 84 NA NA
## 4 2022-01 NA 93 93 NA NA
## 5 2021-12 NA 93 93 NA NA
## 6 2021-11 NA 90 90 NA NA
## 7 2021-10 NA 93 93 NA NA
## 8 2021-09 NA 90 90 NA NA
## 9 2021-08 NA 93 93 NA NA
## 10 2021-07 NA 93 93 NA NA
## 11 2021-06 NA 90 90 NA NA
## 12 2021-05 NA 93 93 NA NA
## 13 2021-04 NA 90 90 NA NA
## 14 2021-03 NA 93 93 NA NA
## 15 2021-02 NA 84 84 NA NA
## 16 2021-01 NA 93 93 NA NA
Function integrateStateVaccine() is updated to better manage sporadic NA values such as in TX and VA:
integrateStateVaccine <- function(vaxCounty,
statePerCap,
keyStates=c(state.abb, "DC"),
joinType=dplyr::right_join,
treatNAZero=TRUE
) {
# FUNCTION ARGUMENTS:
# vaxCounty: processed county-level vaccines data
# statePerCap: per-capita state level data frame
# keyStates: states to be included
# joinType: join to be made (statePerCap is 'left' and summed vaxCounty is 'right')
# treatNAZero: boolean, should NA values in vaccines be treated as 0 rather than throwing an aggregate NA?
# Create the aggregation function
if(isTRUE(treatNAZero)) aggFunc <- specNA(sum) else aggFunc <- sum
# Roll county-level data to state
dfTemp <- vaxCounty %>%
filter(state %in% all_of(keyStates), FIPS != "UNK") %>%
group_by(date, state) %>%
summarize(ctypoppct=aggFunc(vxcpoppct*pop)/sum(pop),
ctygte18pct=aggFunc(vxcgte18pct*popgte18)/sum(popgte18),
ctygte65pct=aggFunc(vxcgte65pct*popgte65)/sum(popgte65),
across(starts_with("pop"), sum, na.rm=TRUE),
.groups="drop"
)
# Integrate and return the data
statePerCap %>%
select(state, date, vxcpoppct, vxcgte18pct, vxcgte65pct) %>%
filter(state %in% all_of(keyStates)) %>%
joinType(dfTemp, by=c("state", "date"))
}
# Check that data are the same
all.equal(allState_20220415,
integrateStateVaccine(cty_vaxdata_20220415$vaxFix,
statePerCap=readFromRDS("cdc_daily_220416")$dfPerCapita,
treatNAZero=FALSE
)
)
## [1] TRUE
# Check differences in updated data
allState_20220415_v2 <- integrateStateVaccine(cty_vaxdata_20220415$vaxFix,
statePerCap=readFromRDS("cdc_daily_220416")$dfPerCapita,
treatNAZero=TRUE
)
# Confirm equality where not NA in original data
all.equal(allState_20220415[complete.cases(allState_20220415), ],
allState_20220415_v2[complete.cases(allState_20220415), ]
)
## [1] TRUE
# Display updated records
allState_20220415_v2[complete.cases(allState_20220415_v2) & !complete.cases(allState_20220415), ] %>%
mutate(ym=customYYYYMM(date)) %>%
filter(ym >= "2021-01-01") %>%
group_by(state, ym) %>%
summarize(across(where(is.numeric), mean), n=n(), .groups="drop")
## # A tibble: 19 x 12
## state ym vxcpoppct vxcgte18pct vxcgte65pct ctypoppct ctygte18pct
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CA 2022-03 71.0 80.6 89.0 69.2 78.4
## 2 CA 2022-04 71.3 80.9 89.4 69.5 78.6
## 3 MA 2021-02 0 0 0 3.72 4.62
## 4 MA 2021-03 12.2 15.1 32.7 11.8 14.7
## 5 MA 2021-04 27.7 34.3 68.9 24.6 30.5
## 6 MA 2021-05 46.0 56.4 80.9 41.0 50.2
## 7 MA 2021-06 58.2 69.5 85.0 52.2 61.9
## 8 MA 2021-07 62.9 73.6 86.6 56.1 65.4
## 9 MA 2021-08 64.9 75.6 87.4 57.9 67.2
## 10 MA 2021-09 67.0 77.7 88.4 59.9 69.1
## 11 MA 2021-10 68.8 79.6 89.5 61.5 70.8
## 12 MA 2021-11 70.4 81.4 91.2 63.0 72.4
## 13 MA 2021-12 73.3 82.9 92.4 65.7 73.8
## 14 MA 2022-01 75.6 84.2 93.1 67.7 74.9
## 15 MA 2022-02 77.1 85.2 93.5 69.1 75.8
## 16 MA 2022-03 77.9 85.8 93.7 69.9 76.3
## 17 MA 2022-04 78.4 86.1 94.3 70.2 76.6
## 18 TX 2022-03 60.6 71.8 86.2 59.9 70.7
## 19 VA 2022-03 72.4 81.5 91.4 57.0 65.3
## # ... with 5 more variables: ctygte65pct <dbl>, pop <dbl>, popgte18 <dbl>,
## # popgte65 <dbl>, n <int>
scoreVaxSimilarity(allState_20220415_v2, minDate="2022-01-01", maxDate="2022-03-31")
County-level burden process mapping is included:
dfRoll20220414 <- createBurdenCountyDate(cty_newdata_20220414,
maxDate="2022-03-31",
rollBy=months(c(0, -3, -6, -9)),
dateSpan=91
)
dfRoll20220414
## # A tibble: 12,568 x 7
## countyFIPS state asofDate tdpm tcpm dpm91 cpm91
## <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl>
## 1 01001 AL 2022-03-31 3777. 280209. 913. 84161.
## 2 01003 AL 2022-03-31 3024. 248233. 367. 70917.
## 3 01005 AL 2022-03-31 3929. 229118. 648. 74415.
## 4 01007 AL 2022-03-31 4510. 286595. 268. 85023.
## 5 01009 AL 2022-03-31 4081. 257670. 657. 64106.
## 6 01011 AL 2022-03-31 5148. 227997. 594. 64449.
## 7 01013 AL 2022-03-31 6582. 259718. 1337. 75072.
## 8 01015 AL 2022-03-31 5431. 284785. 748. 80199.
## 9 01017 AL 2022-03-31 4842. 254646. 421. 70578.
## 10 01019 AL 2022-03-31 3245. 195144. 763. 66728.
## # ... with 12,558 more rows
makeBurdenDatePlot(dfRoll20220414, keyVar="cpm91", timeLabel="3-month", varCeiling=100000, varDivBy=1000)
makeBurdenDatePlot(dfRoll20220414, keyVar="dpm91", timeLabel="3-month", varCeiling=1500)
Post-processing steps are consolidated to a function, so that the process includes:
The postProcessCountyData() includes:
# Updated to include date range
scoreVaxSimilarity <- function(df,
keyStates=c(state.abb, "DC"),
minDate=NULL,
maxDate=NULL,
returnBaseData=FALSE
) {
# FUNCTION ARGUMENTS:
# df: a processed file from integrateStateVaccine()
# keyStates: states to include
# minDate: earliest date to use for scoring (NULL means use all)
# maxDate: latest date to use for scoring (NULL means use all)
# returnBaseData: boolean, should dfBase be returned?
# Set minDate and maxDate if NULL
if(is.null(minDate)) minDate <- min(df$date, na.rm=TRUE)
if(is.null(maxDate)) maxDate <- max(df$date, na.rm=TRUE)
dfBase <- stateAgeVaxEvolution(df, keyState=keyStates, createPlot = FALSE) %>%
mutate(value=ifelse(is.na(value), 0, value), src=ifelse(src=="State", "cdcState", "ctySum")) %>%
pivotData(c("state", "date", "age"), nameVar="src", toLonger=FALSE) %>%
filter(date >= minDate, date <= maxDate) %>%
mutate(ym=customYYYYMM(date)) %>%
group_by(state, ym, age) %>%
summarize(n=n(),
rmse=sqrt(mean((cdcState-ctySum)**2)),
cdcState=mean(cdcState),
ctySum=mean(ctySum),
.groups="drop"
)
p1 <- dfBase %>%
group_by(state, age) %>%
summarize(rmse=mean(rmse), .groups="drop") %>%
ggplot(aes(x=fct_reorder(state, rmse), y=rmse)) +
geom_col(fill="lightblue") +
coord_flip() +
facet_wrap(~age) +
labs(x=NULL,
y="RMSE (Vaccinated by State/Age difference by source)",
title="Difference in vaccinated by state data by source",
subtitle=paste0("Date range: ", minDate, " to ", maxDate)
)
print(p1)
if(isTRUE(returnBaseData)) return(dfBase)
}
# Updated to handle lst passed as a perCapita file
makeBurdenSummary <- function(lst,
groupVar=c("countyFIPS", "state"),
numVarFinal=c("tdpm", "tcpm"),
numVarSum=c("dpm", "cpm"),
keyDate=NULL,
dateRange=28
) {
# FUNCTION ARGUMENTS
# lst: list of processed county burden data (or extracted dfPerCapita file)
# groupVar: grouping variables for the final dataset
# numVarFinal: numeric variables to pull data from the key date
# numVarSum: numeric variables to sum from the key date interval
# keyDate: the key date for the summaries (NULL means use maximum in data)
# dateRange: number of days to include in the numeric interval summaries
# Extract perCapita data if passed as a list
if("list" %in% class(lst)) lst <- lst[["dfPerCapita"]]
# Find keyDate if not provided, convert to date if not already
if(is.null(keyDate)) keyDate <- lst %>% pull(date) %>% max()
if(!("Date" %in% class(keyDate))) keyDate <- as.Date(keyDate)
# Create summary
df <- lst %>%
group_by_at(all_of(groupVar)) %>%
summarize(asofDate=keyDate,
across(all_of(numVarFinal), .fns=~sum(ifelse(date==keyDate, .x, 0))),
across(all_of(numVarSum),
.fns=~sum(ifelse(date>keyDate-dateRange & date<=keyDate, .x, 0)),
.names=paste0("{.col}", as.character(dateRange))
),
.groups="drop"
)
# Return the data frame
df
}
postProcessCountyData <- function(lstCtyBurden,
lstCtyVax,
lstState,
maxDate=NULL,
minDateBurden="2020-02-15",
minDateVax="2021-04-01"
) {
# FUNCTION ARGUMENTS:
# lstCtyBurden: list of processed county-level burden data (or a dfPerCapita file from this list)
# lstCtyVax: list of processed county-level vaccines data (or a vaxFix file from this list)
# lstState: list of processed state-level burden data (or a dfPerCapita file from this list)
# maxDate: maximum date to use for plotting (NULL means latest date in both lstCty and lstState)
# minDateBurden: earliest date for scoring burden similarity across files
# minDateVax: earliest date for scoring vaccine similarity across files
# Extract the relevant perCapita and vaxFix data if needed
if("list" %in% class(lstCtyBurden)) lstCtyBurden <- lstCtyBurden[["dfPerCapita"]]
if("list" %in% class(lstState)) lstState <- lstState[["dfPerCapita"]]
if("list" %in% class(lstCtyVax)) lstCtyVax <- lstCtyVax[["vaxFix"]]
# Get maxDate if not provided
if(is.null(maxDate)) maxDate <- min(max(lstCtyBurden$date, na.rm=TRUE), max(lstState$date, na.rm=TRUE))
cat("\nParameter maxDate is:", as.character(maxDate), "\n\n")
# Data for score similarity process
dfCompare <- compareStateSummedCounty(dfState=lstState,
dfCounty=lstCtyBurden,
inclStates=c(state.abb, "DC"),
dateThru=maxDate,
makePlot=FALSE,
returnData=TRUE
)
scoreSimilarity(dfCompare, minDate=minDateBurden, maxDate=maxDate, makeFacet=FALSE)
# Check differences in data sources
dfAllState <- integrateStateVaccine(lstCtyVax, statePerCap=lstState, treatNAZero=TRUE)
vaxDiff <- scoreVaxSimilarity(dfAllState, minDate=minDateVax, maxDate=maxDate, returnBaseData=TRUE)
# Create county-level burden data by quarters
dfRoll91 <- createBurdenCountyDate(lstCtyBurden,
maxDate=maxDate,
rollBy=months(c(0, -3, -6, -9)),
dateSpan=91
)
makeBurdenDatePlot(dfRoll91, keyVar="cpm91", timeLabel="3-month", varCeiling=100000, varDivBy=1000) %>% print()
makeBurdenDatePlot(dfRoll91, keyVar="dpm91", timeLabel="3-month", varCeiling=1500) %>% print()
# Return the key elements
list(dfCompare=dfCompare, dfAllState=dfAllState, vaxDiff=vaxDiff, dfRoll91=dfRoll91)
}
cty_postdata_20220414 <- postProcessCountyData(lstCtyBurden=cty_newdata_20220414$dfPerCapita,
lstCtyVax=cty_vaxdata_20220415$vaxFix,
lstState=readFromRDS("cdc_daily_220416")$dfPerCapita
)
##
## Parameter maxDate is: 2022-04-11
And stand-alone functions include:
# 1. Plot states with meaningful disconnects on 1+ metrics
compareStateSummedCounty(dfState=readFromRDS("cdc_daily_220416")$dfPerCapita,
dfCounty=cty_newdata_20220414$dfPerCapita,
inclStates=c("GA", "FL", "NE", "OK", "KY", "ME", "VT", "WY", "TN", "RI", "MO", "MA", "KS"),
dateThru="2022-04-10",
makePlot=TRUE,
returnData=FALSE
)
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
# 2. Plot differences in vaccines data if needed
stateAgeVaxEvolution(cty_postdata_20220414$dfAllState,
keyState=c("MA", "HI", "TX", "VA", "VT", "GA", "CO", "WV"),
createPlot = TRUE
)
## Warning: Removed 1 row(s) containing missing values (geom_path).
# 3. Check vaccine similarity scoring on a different time period
scoreVaxSimilarity(cty_postdata_20220414$dfAllState, minDate="2022-01-01", maxDate="2022-03-31")
# 4. Check for states with missing data (after adjustments, should only be HI which does not report by county)
stateAgeVaxEvolution(cty_postdata_20220414$dfAllState, keyState=c(state.abb, "DC"), createPlot = FALSE) %>%
filter(!complete.cases(.), date >= "2021-01-01") %>%
group_by(state, ym=customYYYYMM(date), .groups="drop") %>%
summarize(n=n()) %>%
pivot_wider(ym, names_from="state", values_from="n") %>%
arrange(desc(ym))
## `summarise()` has grouped output by 'state', 'ym'. You can override using the `.groups` argument.
## # A tibble: 16 x 2
## # Groups: ym [16]
## ym HI
## <chr> <int>
## 1 2022-04 42
## 2 2022-03 93
## 3 2022-02 84
## 4 2022-01 93
## 5 2021-12 93
## 6 2021-11 90
## 7 2021-10 93
## 8 2021-09 90
## 9 2021-08 93
## 10 2021-07 93
## 11 2021-06 90
## 12 2021-05 93
## 13 2021-04 90
## 14 2021-03 93
## 15 2021-02 84
## 16 2021-01 93
# 5. Additional rolling data as needed
dfRoll_28 <- createBurdenCountyDate(cty_newdata_20220414,
maxDate="2022-04-10",
rollBy=months(c(0, -1, -2, -3)),
dateSpan=28
)
makeBurdenDatePlot(dfRoll_28, keyVar="cpm28", timeLabel="1-month", varCeiling=50000, varDivBy=1000)
makeBurdenDatePlot(dfRoll_28, keyVar="dpm28", timeLabel="1-month", varCeiling=500)
Function compareStateSummedCounty() is updated to accept a processed frame:
pivotStateBurdenData <- function(df,
inclStates,
varKeep=c("date", "state", "tot_cases", "new_cases", "tot_deaths", "new_deaths"),
varRename=c("tot_cases"="cases", "tot_deaths"="deaths"),
dateThru=NULL
) {
# FUNCTION ARGUMENTS:
# df: a state-level burden data frame
# inclStates: states to be included
# varKeep: variables to be kept from state burden frame
# varRename: variables to be renamed
# dateThru: maximum date for the analysis (NULL means use max(date) from df)
# Create and return pivoted state-level data
df %>%
colSelector(vecSelect=varKeep) %>%
colRenamer(vecRename=varRename) %>%
filter(state %in% all_of(inclStates),
date <= if(is.null(dateThru)) max(date) else as.Date(dateThru)
) %>%
pivot_longer(-c(date, state)) %>%
mutate(value=ifelse(is.na(value), 0, value)) %>%
arrange(date, state, name) %>%
group_by(state, name) %>%
mutate(value7=zoo::rollmean(value, k=7, fill=NA, align="center"), src="state") %>%
ungroup()
}
createSummedCountyBurdenData <- function(df,
inclStates,
varKeep=c("state", "countyFIPS", "date",
"cases", "new_cases", "deaths", "new_deaths"
),
varRename=c(),
dateThru=NULL
) {
# FUNCTION ARGUMENTS:
# df: a state-level burden data frame
# inclStates: states to be included
# varKeep: variables to be kept from state burden frame
# varRename: variables to be renamed
# dateThru: maximum date for the analysis (NULL means use max(date) from df)
# Create and return summed and pivoted county-level data
df %>%
filter(state %in% all_of(inclStates),
date <= if(is.null(dateThru)) max(date) else as.Date(dateThru)
) %>%
colSelector(vecSelect=varKeep) %>%
colRenamer(vecRename=varRename) %>%
pivot_longer(-c(state, countyFIPS, date)) %>%
arrange(state, countyFIPS, date) %>%
group_by(state, countyFIPS, name) %>%
mutate(value7=zoo::rollmean(value, k=7, fill=NA, align="center")) %>%
ungroup() %>%
filter(state %in% all_of(inclStates)) %>%
group_by(state, date, name) %>%
summarize(across(c(value, value7), .fns=sum), .groups="drop") %>%
filter(!is.na(value7)) %>%
mutate(src="county")
}
compareStateSummedCounty <- function(dfState=NULL,
dfCounty=NULL,
lstAll=NULL,
aggData=FALSE,
inclStates=if(isTRUE(aggData)) c(state.abb, "DC") else NULL,
dateThru=NULL,
makePlot=TRUE,
createData=TRUE,
returnData=FALSE
) {
# FUNCTION ARGUMENTS:
# dfState: processed state-level metrics
# dfCounty: processed county-level metrics
# lstAll: list containing processed dfState and dfCounty
# aggData: boolean, should data be aggregated (FALSE means one plot series per state)
# inclStates: character vector of states to include
# dateThru: latest date for analyzsis (NULL means use all data)
# makePlot: boolean, should plots be created for each state?
# createData: boolean, does dfAll need to be built from dfState and dfCounty? FALSE means use dfAll
# returnData: boolean, should a list of processed dfState and processed dfCounty be returned?
# Check that data passed matches parameters
if(isTRUE(createData)) {
if(is.null(dfState) | is.null(dfCounty)) stop("\nNeed to pass dfState and dfCounty when createData=TRUE\n")
if(!is.null(lstAll)) cat("\nlstAll is ignored, to be built from dfState and dfCounty when createData=TRUE\n")
}
if(!isTRUE(createData)) {
if(is.null(lstAll)) stop("\nNeed to pass lstAll when createData=FALSE\n")
if(!is.null(dfState)) cat("\ndfState ignored; using lstAll due to createData=FALSE\n")
if(!is.null(dfCounty)) cat("\ndfCounty ignored; using lstAll due to createData=FALSE\n")
}
# Check that at least one state has been passed
if(is.null(inclStates) | length(inclStates)==0) {
cat("\nNo states passed to compareStateSummedCounty, returning without running function\n")
return()
}
# Create or extract data as directed by parameter createData
if(isTRUE(createData)) {
dfState <- pivotStateBurdenData(dfState, inclStates=inclStates, dateThru=dateThru)
dfCounty <- createSummedCountyBurdenData(dfCounty, inclState=inclStates, dateThru=dateThru)
} else {
dfState <- lstAll[["dfState"]]
dfCounty <- lstAll[["dfCounty"]]
}
# If data are to be aggregated, perform the aggregation
if(isTRUE(aggData)) {
tempAgg <- function(df) {
df %>%
group_by(date, name, src) %>%
summarize(value=specNA(sum)(value, na.rm=TRUE),
value7=specNA(sum)(value7, na.rm=TRUE),
.groups="drop"
) %>%
mutate(state="Aggregated")
}
dfState <- tempAgg(dfState)
dfCounty <- tempAgg(dfCounty)
inclStates <- "Aggregated"
}
if(isTRUE(makePlot)) {
for(thisState in all_of(inclStates)) {
p1 <- dfCounty %>%
filter(state %in% all_of(thisState)) %>%
bind_rows(filter(dfState, state %in% all_of(thisState))) %>%
ggplot(aes(x=date, y=value7)) +
geom_line(aes(group=src, color=src)) +
facet_wrap(~name, scales="free_y") +
labs(x=NULL,
y="Rolling 7-day mean",
title="Summed county burden by metric (7-day mean)",
subtitle=paste0(thisState, " counties")
) +
scale_color_discrete("Source")
print(p1)
}
}
if(returnData) list(dfState=dfState, dfCounty=dfCounty)
}
# Check that results are the same when creating the full frame (default)
chk1 <- compareStateSummedCounty(dfState=readFromRDS("cdc_daily_220416")$dfPerCapita,
dfCounty=cty_newdata_20220414$dfPerCapita,
inclStates=c(state.abb, "DC"),
dateThru="2022-04-11",
makePlot=FALSE,
returnData=TRUE
)
identical(chk1, cty_postdata_20220414$dfCompare)
## [1] TRUE
# Check that results are the same when using the existing frame
chk2 <- compareStateSummedCounty(lstAll=cty_postdata_20220414$dfCompare,
inclStates=c(state.abb, "DC"),
dateThru="2022-04-11",
createData=FALSE,
makePlot=FALSE,
returnData=TRUE
)
identical(chk1, chk2)
## [1] TRUE
# Example plotting for select states
compareStateSummedCounty(lstAll=cty_postdata_20220414$dfCompare,
inclStates=c("GA", "FL", "NE"),
dateThru="2022-04-11",
createData=FALSE,
makePlot=TRUE,
returnData=FALSE
)
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
An overall function for additional post-processing is written:
additionalCountyPostProcess <- function(lstPost,
p1CompareStates=c(),
p1AggData=FALSE,
p2VaxStates=c(),
p3VaxTimes=c(),
p4DF=NULL,
p4MaxDate=NULL,
p4RollBy=months(c(0, -1, -2, -3)),
p4DateSpan=28,
p4CPMCeiling=50000,
p4DPMCeiling=500
) {
# FUNCTION ARGUMENTS:
# lstPost: list of post-processed county data
# p1CompareStates: states that should be compared vs. summed county
# p1AggData: boolean, should the comparison states all be aggregated to a single comparison?
# p2VaxStates: states that should be compared for vaccine evolution
# p3VaxTimes: character vector of form c(minDate, maxDate) for time period to score vaccine similarity
# p4DF: data frame for creating rolling data (NULL means do not run)
# p4MaxDate: maximum date for rolling analysis (NULL means use maximum date in p4DF minus 1 day)
# p4RollBy: time periods to roll back for analysis
# p4DateSpan: size of windows for rolling analysis
# p4CPMCeiling: ceiling for plots on CPM (all values at or above this will be the same color)
# p4DPMCeiling: ceiling for plots on DPM (all values at or above this will be the same color)
# 1. Plotting state vs. summed county for key states
if(length(p1CompareStates) > 0) {
compareStateSummedCounty(lstAll=lstPost[["dfCompare"]],
inclStates=p1CompareStates,
createData=FALSE,
aggData=p1AggData
)
}
# 2. Plot differences in vaccines data if needed
if(length(p2VaxStates) > 0)
stateAgeVaxEvolution(lstPost[["dfAllState"]], keyState=p2VaxStates)
# 3. Check vaccine similarity scoring on a different time period
if(length(p3VaxTimes) > 0) {
if(length(p3VaxTimes) != 2 | p3VaxTimes[2] < p3VaxTimes[1])
cat("\np3VaxTimes should be c(minDate, maxDate), skipping this step due to bad parameter\n")
else
scoreVaxSimilarity(lstPost[["dfAllState"]], minDate=p3VaxTimes[1], maxDate=p3VaxTimes[2])
}
# 4. Additional rolling data as needed
if(!is.null(p4DF)) {
if(is.null(p4MaxDate)) p4MaxDate <- max(p4DF$date) - lubridate::days(1)
dfRoll <- createBurdenCountyDate(p4DF, maxDate=p4MaxDate, rollBy=p4RollBy, dateSpan=p4DateSpan)
makeBurdenDatePlot(dfRoll,
keyVar=paste0("cpm", p4DateSpan),
timeLabel=paste0(p4DateSpan, "-day"),
varCeiling=p4CPMCeiling,
varDivBy=1000
) %>%
print()
makeBurdenDatePlot(dfRoll,
keyVar=paste0("dpm", p4DateSpan),
timeLabel=paste0(p4DateSpan, "-day"),
varCeiling=p4DPMCeiling
) %>%
print()
}
}
# Nothing will run
additionalCountyPostProcess(cty_postdata_20220414)
# Just burden comparisons
additionalCountyPostProcess(cty_postdata_20220414, p1CompareStates=c("GA", "FL", "NE"))
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
additionalCountyPostProcess(cty_postdata_20220414, p1CompareStates=c(state.abb, "DC"), p1AggData=TRUE)
## Warning: Removed 6 row(s) containing missing values (geom_path).
# Just vaccine comparisons
additionalCountyPostProcess(cty_postdata_20220414, p2VaxStates=c("MA", "HI", "TX"))
## Warning: Removed 1465 row(s) containing missing values (geom_path).
# Just scoring updates (and errors)
additionalCountyPostProcess(cty_postdata_20220414, p3VaxTimes=c("2022-01-01"))
##
## p3VaxTimes should be c(minDate, maxDate), skipping this step due to bad parameter
additionalCountyPostProcess(cty_postdata_20220414, p3VaxTimes=c("2022-04-10", "2022-01-01"))
##
## p3VaxTimes should be c(minDate, maxDate), skipping this step due to bad parameter
additionalCountyPostProcess(cty_postdata_20220414, p3VaxTimes=sort(c("2022-04-10", "2022-01-01")))
# Just new rolling data
additionalCountyPostProcess(p4DF=cty_newdata_20220414$dfPerCapita)
additionalCountyPostProcess(p4DF=cty_newdata_20220414$dfPerCapita,
p4RollBy=months(c(0, -3, -6, -9)),
p4DateSpan=91,
p4CPMCeiling=100000,
p4DPMCeiling=1000
)
The latest county-level burden data are downloaded:
urlMapper[["usafCase"]] <- "https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv"
urlMapper[["usafDeath"]] <- "https://static.usafacts.org/public/data/covid-19/covid_deaths_usafacts.csv"
urlMapper[["usafPop"]] <- "https://static.usafacts.org/public/data/covid-19/covid_county_population_usafacts.csv"
readList <- list("usafCase"="./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20220507.csv",
"usafDeath"="./RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20220507.csv"
)
compareList <- list("usafCase"=readFromRDS("cty_newdata_20220414")$dfRaw$usafCase,
"usafDeath"=readFromRDS("cty_newdata_20220414")$dfRaw$usafDeath
)
# Use existing clusters
cty_newdata_20220507 <- readRunUSAFacts(maxDate="2022-05-05",
downloadTo=lapply(readList,
FUN=function(x) if(file.exists(x)) NA else x
),
readFrom=readList,
compareFile=compareList,
writeLog="./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log",
ovrwriteLog=TRUE,
useClusters=readFromRDS("cty_newdata_20210813")$useClusters,
skipAssessmentPlots=FALSE,
brewPalette="Paired"
)
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## `County Name` = col_character(),
## State = col_character(),
## StateFIPS = col_character()
## )
## i Use `spec()` for the full column specifications.
##
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS
##
##
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date
##
##
## Checking for similarity of: column names
## In reference but not in current:
## In current but not in reference:
##
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 24
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
##
## Checking for similarity of: county
## In reference but not in current:
## In current but not in reference:
##
##
## ***Differences of at least 5 and at least 5%
##
## 5 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
##
##
## ***Differences of at least 0 and at least 0.1%
##
## 95 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## `County Name` = col_character(),
## State = col_character(),
## StateFIPS = col_character()
## )
## i Use `spec()` for the full column specifications.
##
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS
##
##
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date
##
##
## Checking for similarity of: column names
## In reference but not in current:
## In current but not in reference:
##
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 24
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
##
## Checking for similarity of: county
## In reference but not in current:
## In current but not in reference:
##
##
## ***Differences of at least 5 and at least 5%
##
## 4 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
##
##
## ***Differences of at least 0 and at least 0.1%
##
## 88 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
##
##
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 4
## isType cases new_cases n
## <chr> <dbl> <dbl> <dbl>
## 1 before 2.40e+10 7.96e+7 2666155
## 2 after 2.39e+10 7.91e+7 2623570
## 3 pctchg 5.37e- 3 6.20e-3 0.0160
##
##
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 4
## isType deaths new_deaths n
## <chr> <dbl> <dbl> <dbl>
## 1 before 3.87e+8 990515 2666155
## 2 after 3.77e+8 940572 2623570
## 3 pctchg 2.55e-2 0.0504 0.0160
## NULL
# Plot all counties based on closest cluster
sparseCountyClusterMap(cty_newdata_20220507$useClusters,
caption="Includes only counties with 25k+ population",
brewPalette="viridis"
)
# Save the refreshed file
saveToRDS(cty_newdata_20220507, ovrWriteError=FALSE)
Vaccines data are also updated:
cty_vaxdata_20220508 <- processCountyVaccines(loc="./RInputFiles/Coronavirus/county_vaccine_20220508.csv",
ctyList=cty_newdata_20220507,
minDateCD=c("2022-02-01", "2022-02-01"),
maxDateCD="2022-04-30"
)
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## Date = col_character(),
## FIPS = col_character(),
## Recip_County = col_character(),
## Recip_State = col_character(),
## SVI_CTGY = col_character(),
## Metro_status = col_character()
## )
## i Use `spec()` for the full column specifications.
##
## Records from other than 50 states and DC:
## # A tibble: 10 x 2
## state n
## <chr> <int>
## 1 AS 502
## 2 FM 507
## 3 GU 1017
## 4 MH 494
## 5 MP 504
## 6 PR 40365
## 7 PW 501
## 8 UNK 308
## 9 VI 2034
## 10 <NA> 34
## Warning: Removed 16 rows containing non-finite values (stat_boxplot).
## Warning: Removed 16 rows containing non-finite values (stat_boxplot).
## Warning: Removed 16 rows containing non-finite values (stat_boxplot).
##
## Count of NA records by column
## state FIPS popgte65_minpop popgte65_maxpop popgte65_nnA
## 0 0 0 0 0
## n
## 0
##
## Records where minimum and maximum population differ# A tibble: 0 x 5
## # ... with 5 variables: state <chr>, FIPS <chr>, age <chr>, minpop <dbl>,
## # maxpop <dbl>
##
##
##
## Will run with parameters:
## burdenVar: cpm dpm
## vaxVar: vxcpoppct vxcpoppct
## minDateCD: 2022-02-01 2022-02-01
## maxDateCD: 2022-04-30 2022-04-30
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_point).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_point).
##
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -16913246 -865056 -142320 1156343 29100686
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18410.802 845.223 21.782 <2e-16 ***
## vaxMetric 5.315 13.655 0.389 0.697
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2841000 on 3132 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 4.837e-05, Adjusted R-squared: -0.0002709
## F-statistic: 0.1515 on 1 and 3132 DF, p-value: 0.6971
##
##
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric,
## data = dfReg, weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -15433621 -1306806 -474319 813529 28327596
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## type<25k 15900.16 2746.17 5.790 7.74e-09 ***
## type>500k 8211.07 1738.47 4.723 2.42e-06 ***
## type100k-500k 8147.59 1751.84 4.651 3.44e-06 ***
## type25k-100k 15344.40 1967.89 7.797 8.54e-15 ***
## vaxMetric:type<25k 136.60 57.61 2.371 0.0178 *
## vaxMetric:type>500k 135.40 25.81 5.246 1.66e-07 ***
## vaxMetric:type100k-500k 186.48 29.33 6.358 2.34e-10 ***
## vaxMetric:type25k-100k 123.48 38.62 3.197 0.0014 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2753000 on 3126 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.8307, Adjusted R-squared: 0.8302
## F-statistic: 1917 on 8 and 3126 DF, p-value: < 2.2e-16
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_point).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_point).
##
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -524335 -23570 1256 31450 513584
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 856.3315 17.5443 48.81 <2e-16 ***
## vaxMetric -9.4468 0.2834 -33.33 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 58980 on 3132 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.2618, Adjusted R-squared: 0.2616
## F-statistic: 1111 on 1 and 3132 DF, p-value: < 2.2e-16
##
##
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric,
## data = dfReg, weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -520669 -28168 -3747 27767 539544
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## type<25k 738.7731 58.3126 12.669 < 2e-16 ***
## type>500k 803.2666 36.9150 21.760 < 2e-16 ***
## type100k-500k 781.5071 37.1988 21.009 < 2e-16 ***
## type25k-100k 754.0531 41.7865 18.045 < 2e-16 ***
## vaxMetric:type<25k -5.8518 1.2233 -4.783 1.80e-06 ***
## vaxMetric:type>500k -8.8002 0.5481 -16.057 < 2e-16 ***
## vaxMetric:type100k-500k -8.3533 0.6228 -13.413 < 2e-16 ***
## vaxMetric:type25k-100k -6.6905 0.8201 -8.159 4.86e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 58460 on 3126 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.7375, Adjusted R-squared: 0.7368
## F-statistic: 1098 on 8 and 3126 DF, p-value: < 2.2e-16
# Save the refreshed file
saveToRDS(cty_vaxdata_20220508, ovrWriteError=FALSE)
County-level data are post-processed:
cty_postdata_20220507 <- postProcessCountyData(lstCtyBurden=cty_newdata_20220507$dfPerCapita,
lstCtyVax=cty_vaxdata_20220508$vaxFix,
lstState=readFromRDS("cdc_daily_220501")$dfPerCapita
)
##
## Parameter maxDate is: 2022-04-30
# Save the refreshed file
saveToRDS(cty_postdata_20220507, ovrWriteError=FALSE)
Additional post-processing steps are run:
# Step 1a: Burden comparisons for aggregated states
additionalCountyPostProcess(cty_postdata_20220507, p1CompareStates=c(state.abb, "DC"), p1AggData=TRUE)
## Warning: Removed 6 row(s) containing missing values (geom_path).
# Step 1: Burden aggregation for key states
# Step 2: vaccine comparisons
# Step 3: Scoring updates (and errors)
# Step 4: New rolling data (28-day default with ceilings 50000 CPM, 500 DPM)
additionalCountyPostProcess(cty_postdata_20220507,
p1CompareStates=c("GA", "FL", "NE"),
p2VaxStates=c("MA", "HI", "TX", "VA", "VT", "GA", "CO", "SD"),
p3VaxTimes=sort(c("2021-05-01", "2022-04-30")),
p4DF=cty_newdata_20220507$dfPerCapita
)
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 6 row(s) containing missing values (geom_path).
## Warning: Removed 8 row(s) containing missing values (geom_path).
Evolution of burden and vaccines is compared for select counties:
# Sample list of counties
ctyList <- c("01089", "29510", "34027")
# Extract name and population data
tmpNamePop <- cty_newdata_20220507$countyData %>%
filter(countyFIPS %in% ctyList) %>%
mutate(fullName=paste0(countyName, ", ", state, " (pop: ", round(pop/1000), "k)"))
# Extract burden per-capita data
tmpBurden <- cty_newdata_20220507$dfPerCapita %>%
filter(countyFIPS %in% ctyList)
# Extract vaccines per-capita data
tmpVax <- cty_vaxdata_20220508$vaxFix %>%
filter(FIPS %in% ctyList)
# Create integrated database of burden and vaccines
tmpCombine <- tmpBurden %>%
select(countyFIPS, date, cpm7, dpm7, tcpm7, tdpm7) %>%
full_join(tmpVax %>% select(countyFIPS=FIPS, date, vxcpoppct, vxcgte18pct, vxcgte65pct),
by=c("countyFIPS", "date")
) %>%
pivot_longer(-c(countyFIPS, date))
# Create plot from data
tmpNamePopVec <- tmpNamePop$fullName %>% purrr::set_names(tmpNamePop$countyFIPS)
tmpVars <- c("dpm7", "tdpm7", "cpm7", "tcpm7", "vxcpoppct", "vxcgte65pct")
tmpVarNames <- c("1. Death per million", "2. Death per million (cum)",
"3. Case per million", "4. Case per million (cum)",
"5. % Vaccinated (all)", "6. % Vaccinated (65+)"
) %>%
purrr::set_names(tmpVars)
tmpCombine %>%
filter(name %in% all_of(tmpVars), !is.na(value)) %>%
ggplot(aes(x=date, y=value)) +
geom_line(aes(color=tmpNamePopVec[countyFIPS], group=countyFIPS)) +
facet_wrap(~tmpVarNames[name], scales="free_y", ncol=2) +
labs(x=NULL, y=NULL, title="Evolution of metrics by select county") +
scale_color_discrete("")
A function is written for the extracts:
compareBurdenVaxCounty <- function(lstBurden,
lstVax,
ctyIDs,
burdenVars=c("dpm7", "tdpm7", "cpm7", "tcpm7"),
vaxVars=c("vxcpoppct", "vxcgte65pct"),
plotVars=c(burdenVars, vaxVars),
plotVarNames=c("1. Death per million", "2. Death per million (cum)",
"3. Case per million", "4. Case per million (cum)",
"5. % Vaccinated (all)", "6. % Vaccinated (65+)"
),
printPlot=TRUE,
returnData=!isTRUE(printPlot)
) {
# FUNCTION ARGUMENTS:
# lstBurden: processed file containing county-level burden data
# lstVax: processed file containing county-level vaccines data
# ctyIDs: vector of county-FIPS to process OR named list of county-FIPS to amalgamate
# if vector, each plotted separately with county name as legend and color
# if named list, each element amalgamated with list name as legend and color
# burdenVars: variables to plot from burden data
# vaxVars: variables to plot from vaccines data
# plotVars: variables to be plotted
# plotVarNames: names to be associated to plotVars
# printPlot: boolean, should plot be created and printed?
# returnData: boolean, should data frame be returned?
# Create valid list of countyFIPS for processing
# If passed as list, convert to vector; convert all to 5-string characters
if("list" %in% class(ctyIDs)) ctyUse <- zeroPad5(unname(unlist(ctyIDs)))
else ctyUse <- zeroPad5(ctyIDs)
# Extract name and population data
dfNamePop <- lstBurden[["countyData"]] %>%
filter(countyFIPS %in% ctyUse) %>%
mutate(fullName=paste0(countyName, ", ", state, " (pop: ", round(pop/1000), "k)"))
# Extract burden per-capita data
dfBurden <- lstBurden[["dfPerCapita"]] %>%
filter(countyFIPS %in% ctyUse)
# Extract vaccines per-capita data
dfVax <- lstVax[["vaxFix"]] %>%
filter(FIPS %in% ctyUse)
# Create integrated database of burden and vaccines
dfCombine <- dfBurden %>%
select(countyFIPS, date, all_of(burdenVars)) %>%
full_join(dfVax %>% select(countyFIPS=FIPS, date, all_of(vaxVars)),
by=c("countyFIPS", "date")
) %>%
pivot_longer(-c(countyFIPS, date))
# If data to be amalgamated, run here
if("list" %in% class(ctyIDs)) {
# Create mapping frame
mapFrame <- map_dfr(names(ctyIDs), .f=function(x) tibble::tibble(mapName=x, countyFIPS=ctyIDs[[x]]))
# Add population data and mapped name
dfCombine <- dfCombine %>%
filter(!is.na(value)) %>%
inner_join(select(dfNamePop, countyFIPS, pop), by="countyFIPS") %>%
inner_join(mapFrame, by="countyFIPS") %>%
group_by(mapName, date, name) %>%
summarize(value=sum(value*pop)/sum(pop), pop=sum(pop), .groups="drop")
}
# Create vectors for mapping countyFIPS and metric abbreviations to descriptive names
varNameVec <- plotVarNames %>% purrr::set_names(plotVars)
# Create vectors for mapping countyFIPS to descriptive names (only relevant if not amalgamated)
namePopVec <- dfNamePop$fullName %>% purrr::set_names(dfNamePop$countyFIPS)
# Create and display plot (if requested)
if(isTRUE(printPlot)) {
p1 <- dfCombine %>%
filter(name %in% all_of(plotVars), !is.na(value)) %>%
ggplot(aes(x=date, y=value)) +
facet_wrap(~varNameVec[name], scales="free_y", ncol=2) +
labs(x=NULL, y=NULL, title="Evolution of metrics by select county") +
scale_color_discrete("")
# Add lines colored appropriately
if("list" %in% class(ctyIDs)) p1 <- p1 + geom_line(aes(color=mapName, group=mapName))
else p1 <- p1 + geom_line(aes(color=namePopVec[countyFIPS], group=countyFIPS))
print(p1)
}
# Return data if requested
if(isTRUE(returnData)) return(dfCombine)
}
# Run with defaults
compareBurdenVaxCounty(lstBurden=cty_newdata_20220507, lstVax=cty_vaxdata_20220508, ctyIDs=ctyList)
# Run for data only
compareBurdenVaxCounty(lstBurden=cty_newdata_20220507, lstVax=cty_vaxdata_20220508, ctyIDs=ctyList, printPlot=FALSE)
## # A tibble: 15,066 x 4
## countyFIPS date name value
## <chr> <date> <chr> <dbl>
## 1 01089 2020-01-22 dpm7 NA
## 2 01089 2020-01-22 tdpm7 NA
## 3 01089 2020-01-22 cpm7 NA
## 4 01089 2020-01-22 tcpm7 NA
## 5 01089 2020-01-22 vxcpoppct NA
## 6 01089 2020-01-22 vxcgte65pct NA
## 7 29510 2020-01-22 dpm7 NA
## 8 29510 2020-01-22 tdpm7 NA
## 9 29510 2020-01-22 cpm7 NA
## 10 29510 2020-01-22 tcpm7 NA
## # ... with 15,056 more rows
The function has been updated so that it can take an amalgamation of counties:
# Single amalgamation
compareBurdenVaxCounty(lstBurden=cty_newdata_20220507, lstVax=cty_vaxdata_20220508, ctyIDs=list("all"=ctyList))
# Select states
keyStates <- c("NY", "LA", "CA", "TX")
tmpStates <- cty_newdata_20220507$countyData %>%
filter(pop >= 100000, pop <= 500000, state %in% all_of(keyStates))
tmpList <- lapply(keyStates, FUN=function(x) tmpStates %>% filter(state==x) %>% pull(countyFIPS)) %>%
purrr::set_names(paste0(keyStates, " counties 100k-500k"))
compareBurdenVaxCounty(lstBurden=cty_newdata_20220507, lstVax=cty_vaxdata_20220508, ctyIDs=tmpList, returnData=TRUE)
## # A tibble: 17,352 x 5
## mapName date name value pop
## <chr> <date> <chr> <dbl> <dbl>
## 1 CA counties 100k-500k 2020-01-25 cpm7 2.61 5357286
## 2 CA counties 100k-500k 2020-01-25 dpm7 0 5357286
## 3 CA counties 100k-500k 2020-01-25 tcpm7 17.5 5357286
## 4 CA counties 100k-500k 2020-01-25 tdpm7 0 5357286
## 5 CA counties 100k-500k 2020-01-26 cpm7 0.293 5357286
## 6 CA counties 100k-500k 2020-01-26 dpm7 0 5357286
## 7 CA counties 100k-500k 2020-01-26 tcpm7 17.8 5357286
## 8 CA counties 100k-500k 2020-01-26 tdpm7 0 5357286
## 9 CA counties 100k-500k 2020-01-27 cpm7 0.213 5357286
## 10 CA counties 100k-500k 2020-01-27 dpm7 0 5357286
## # ... with 17,342 more rows
The function is run on subsets of counties based on most recent vaccination data:
tmpSegVax <- cty_vaxdata_20220508$vaxFix %>%
filter(date==max(date), pop >= 50000, pop <= 500000, !is.na(vxcgte18pct))
tmpSegVax %>%
ggplot(aes(x=vxcgte18pct)) +
geom_histogram(fill="lightblue", bins=100) +
lims(x=c(0, 100)) +
labs(x="% Vaccinated (18+)", y=NULL, title="# counties by % vaccinated (18+)", subtitle="Population 50k-500k")
## Warning: Removed 2 rows containing missing values (geom_bar).
tmpSegVax <- tmpSegVax %>%
mutate(bucket=case_when(vxcgte18pct<55 ~ "0-54",
vxcgte18pct<=70 ~ "55-70",
vxcgte18pct<=100 ~ "71-100",
TRUE ~ "err"
)
)
tmpList <- lapply(c("0-54", "55-70", "71-100"), FUN=function(x) filter(tmpSegVax, bucket==x) %>% pull(FIPS)) %>%
purrr::set_names("3. Under 55%", "2. 55%-70%", "1. Over 70%")
compareBurdenVaxCounty(lstBurden=cty_newdata_20220507, lstVax=cty_vaxdata_20220508, ctyIDs=tmpList, returnData=TRUE)
## # A tibble: 13,014 x 5
## mapName date name value pop
## <chr> <date> <chr> <dbl> <dbl>
## 1 1. Over 70% 2020-01-25 cpm7 0.187 55026673
## 2 1. Over 70% 2020-01-25 dpm7 0.00260 55026673
## 3 1. Over 70% 2020-01-25 tcpm7 1.23 55026673
## 4 1. Over 70% 2020-01-25 tdpm7 0.0182 55026673
## 5 1. Over 70% 2020-01-26 cpm7 0.0234 55026673
## 6 1. Over 70% 2020-01-26 dpm7 0 55026673
## 7 1. Over 70% 2020-01-26 tcpm7 1.26 55026673
## 8 1. Over 70% 2020-01-26 tdpm7 0.0182 55026673
## 9 1. Over 70% 2020-01-27 cpm7 0.0234 55026673
## 10 1. Over 70% 2020-01-27 dpm7 0 55026673
## # ... with 13,004 more rows
The function is run on subsets of counties based on most recent change in deaths per capita:
tmpSegDeath <- cty_newdata_20220507$dfPerCapita %>%
inner_join(select(cty_newdata_20220507$countyData, countyFIPS, pop), by="countyFIPS") %>%
filter(date %in% c(max(date), max(date)-lubridate::days(360)),
pop >= 50000, pop <= 500000,
!is.na(tdpm)
) %>%
group_by(countyFIPS) %>%
mutate(deltaDeath=max(tdpm)-min(tdpm)) %>%
ungroup()
tmpSegDeath %>%
filter(date==max(date)) %>%
ggplot(aes(x=deltaDeath)) +
geom_histogram(fill="lightblue", bins=100) +
labs(x="Death per million in most recent year",
y=NULL,
title="# counties by DPM in most recent year",
subtitle="Population 50k-500k"
)
tmpSegDeath <- tmpSegDeath %>%
filter(date==max(date)) %>%
mutate(bucket=case_when(deltaDeath<1000 ~ "0-999",
deltaDeath<=2000 ~ "1000-2000",
deltaDeath<=4000 ~ "2001+",
TRUE ~ "err"
)
)
tmpList <- lapply(c("0-999", "1000-2000", "2001+"),
FUN=function(x) filter(tmpSegDeath, bucket==x) %>% pull(countyFIPS)
) %>%
purrr::set_names("3. Under 1000", "2. 1000-2000", "1. Over 2000")
compareBurdenVaxCounty(lstBurden=cty_newdata_20220507, lstVax=cty_vaxdata_20220508, ctyIDs=tmpList, returnData=TRUE)
## # A tibble: 13,014 x 5
## mapName date name value pop
## <chr> <date> <chr> <dbl> <dbl>
## 1 1. Over 2000 2020-01-25 cpm7 0 16960028
## 2 1. Over 2000 2020-01-25 dpm7 0 16960028
## 3 1. Over 2000 2020-01-25 tcpm7 0 16960028
## 4 1. Over 2000 2020-01-25 tdpm7 0 16960028
## 5 1. Over 2000 2020-01-26 cpm7 0 16960028
## 6 1. Over 2000 2020-01-26 dpm7 0 16960028
## 7 1. Over 2000 2020-01-26 tcpm7 0 16960028
## 8 1. Over 2000 2020-01-26 tdpm7 0 16960028
## 9 1. Over 2000 2020-01-27 cpm7 0 16960028
## 10 1. Over 2000 2020-01-27 dpm7 0 16960028
## # ... with 13,004 more rows
The function is run on subsets of counties based on most recent change in cases per capita:
tmpSegCase <- cty_newdata_20220507$dfPerCapita %>%
inner_join(select(cty_newdata_20220507$countyData, countyFIPS, pop), by="countyFIPS") %>%
filter(date %in% c(max(date), max(date)-lubridate::days(360)),
pop >= 50000, pop <= 500000,
!is.na(tcpm)
) %>%
group_by(countyFIPS) %>%
mutate(deltaCase=max(tcpm)-min(tcpm)) %>%
ungroup()
tmpSegCase %>%
filter(date==max(date)) %>%
ggplot(aes(x=deltaCase)) +
geom_histogram(fill="lightblue", bins=100) +
labs(x="Cases per million in most recent year",
y=NULL,
title="# counties by CPM in most recent year",
subtitle="Population 50k-500k"
)
tmpSegCase <- tmpSegCase %>%
filter(date==max(date)) %>%
mutate(bucket=case_when(deltaCase<125000 ~ "0-125k",
deltaCase<=175000 ~ "125k-175k",
deltaCase<=300000 ~ "175k+",
TRUE ~ "err"
)
)
tmpList <- lapply(c("0-125k", "125k-175k", "175k+"),
FUN=function(x) filter(tmpSegCase, bucket==x) %>% pull(countyFIPS)
) %>%
purrr::set_names("3. Under 125k", "2. 125k-175k", "1. Over 175k")
compareBurdenVaxCounty(lstBurden=cty_newdata_20220507, lstVax=cty_vaxdata_20220508, ctyIDs=tmpList, returnData=TRUE)
## # A tibble: 13,014 x 5
## mapName date name value pop
## <chr> <date> <chr> <dbl> <dbl>
## 1 1. Over 175k 2020-01-25 cpm7 1.13e- 1 17660748
## 2 1. Over 175k 2020-01-25 dpm7 8.09e- 3 17660748
## 3 1. Over 175k 2020-01-25 tcpm7 7.85e- 1 17660748
## 4 1. Over 175k 2020-01-25 tdpm7 5.66e- 2 17660748
## 5 1. Over 175k 2020-01-26 cpm7 8.09e- 3 17660748
## 6 1. Over 175k 2020-01-26 dpm7 0 17660748
## 7 1. Over 175k 2020-01-26 tcpm7 7.93e- 1 17660748
## 8 1. Over 175k 2020-01-26 tdpm7 5.66e- 2 17660748
## 9 1. Over 175k 2020-01-27 cpm7 6.59e-18 17660748
## 10 1. Over 175k 2020-01-27 dpm7 0 17660748
## # ... with 13,004 more rows
The process is converted to functional form, allowing for custom labels:
compareBurdenVaxCounty <- function(lstBurden,
lstVax,
ctyIDs,
burdenVars=c("dpm7", "tdpm7", "cpm7", "tcpm7"),
vaxVars=c("vxcpoppct", "vxcgte65pct"),
plotVars=c(burdenVars, vaxVars),
plotVarNames=c("1. Death per million", "2. Death per million (cum)",
"3. Case per million", "4. Case per million (cum)",
"5. % Vaccinated (all)", "6. % Vaccinated (65+)"
),
printPlot=TRUE,
p1Title="Evolution of metrics by select county",
p1SubTitle=ggplot2::waiver(),
p1ScaleLabel="",
returnData=!isTRUE(printPlot)
) {
# FUNCTION ARGUMENTS:
# lstBurden: processed file containing county-level burden data
# lstVax: processed file containing county-level vaccines data
# ctyIDs: vector of county-FIPS to process OR named list of county-FIPS to amalgamate
# if vector, each plotted separately with county name as legend and color
# if named list, each element amalgamated with list name as legend and color
# burdenVars: variables to plot from burden data
# vaxVars: variables to plot from vaccines data
# plotVars: variables to be plotted
# plotVarNames: names to be associated to plotVars
# printPlot: boolean, should plot be created and printed?
# p1Title: title to be used in plot
# p1SubTitle: subtitle to be used in plot
# p1ScaleLabel: label to be used for the color scale in plot
# returnData: boolean, should data frame be returned?
# Create valid list of countyFIPS for processing
# If passed as list, convert to vector; convert all to 5-string characters
if("list" %in% class(ctyIDs)) ctyUse <- zeroPad5(unname(unlist(ctyIDs)))
else ctyUse <- zeroPad5(ctyIDs)
# Extract name and population data
dfNamePop <- lstBurden[["countyData"]] %>%
filter(countyFIPS %in% ctyUse) %>%
mutate(fullName=paste0(countyName, ", ", state, " (pop: ", round(pop/1000), "k)"))
# Extract burden per-capita data
dfBurden <- lstBurden[["dfPerCapita"]] %>%
filter(countyFIPS %in% ctyUse)
# Extract vaccines per-capita data
dfVax <- lstVax[["vaxFix"]] %>%
filter(FIPS %in% ctyUse)
# Create integrated database of burden and vaccines
dfCombine <- dfBurden %>%
select(countyFIPS, date, all_of(burdenVars)) %>%
full_join(dfVax %>% select(countyFIPS=FIPS, date, all_of(vaxVars)),
by=c("countyFIPS", "date")
) %>%
pivot_longer(-c(countyFIPS, date))
# If data to be amalgamated, run here
if("list" %in% class(ctyIDs)) {
# Create mapping frame
mapFrame <- map_dfr(names(ctyIDs), .f=function(x) tibble::tibble(mapName=x, countyFIPS=ctyIDs[[x]]))
# Add population data and mapped name
dfCombine <- dfCombine %>%
filter(!is.na(value)) %>%
inner_join(select(dfNamePop, countyFIPS, pop), by="countyFIPS") %>%
inner_join(mapFrame, by="countyFIPS") %>%
group_by(mapName, date, name) %>%
summarize(value=sum(value*pop)/sum(pop), pop=sum(pop), n=n(), .groups="drop") %>%
group_by(mapName) %>%
mutate(mapName2=paste0(mapName, " (n=", max(n), ")")) %>%
ungroup()
}
# Create vectors for mapping countyFIPS and metric abbreviations to descriptive names
varNameVec <- plotVarNames %>% purrr::set_names(plotVars)
# Create vectors for mapping countyFIPS to descriptive names (only relevant if not amalgamated)
namePopVec <- dfNamePop$fullName %>% purrr::set_names(dfNamePop$countyFIPS)
# Create and display plot (if requested)
if(isTRUE(printPlot)) {
p1 <- dfCombine %>%
filter(name %in% all_of(plotVars), !is.na(value)) %>%
ggplot(aes(x=date, y=value)) +
facet_wrap(~varNameVec[name], scales="free_y", ncol=2) +
labs(x=NULL, y=NULL, title=p1Title, subtitle=p1SubTitle) +
scale_color_discrete(p1ScaleLabel)
# Add lines colored appropriately
if("list" %in% class(ctyIDs)) {
if(is.null(p1ScaleLabel) | p1ScaleLabel=="") p1 <- p1 + geom_line(aes(color=mapName, group=mapName))
else p1 <- p1 + geom_line(aes(color=mapName2, group=mapName2))
}
else p1 <- p1 + geom_line(aes(color=namePopVec[countyFIPS], group=countyFIPS))
print(p1)
}
# Return data if requested
if(isTRUE(returnData)) return(dfCombine)
}
plotCountyEvolution <- function(lst,
lstVax,
cutsLabels,
baseVar,
lstFilter=list(),
lstExclude=list(),
segVar=baseVar,
popRange=c(0, +Inf),
dateRange=c(lubridate::days(0)),
p1Title="Evolution of metrics by select county",
p1SubTitle=ggplot2::waiver(),
p1ScaleLabel=""
) {
# FUNCTION ARGUMNETS:
# lst: processed list file containing per-capita and county population data
# lstVax: processed list file containing vaccines data
# cutsLabels: named character vector of form c("label"="values-up-tp")
# baseVar: base variable to be used for segmenting (needs to be not-NA)
# lstFilter: named list of items to include when setting df (e.g., list("state"=state.abb[1:10]))
# lstExclude: named list of items to exclude when setting df (e.g., list("state"=state.abb[41:50]))
# segVar: variable used in segmentation (may be a transform of baseVar, allow for different name)
# popRange: character vector of length 2 for minimum and maximum population to include
# dateRange: dates to be considered for segmenting (all subtracted from max(date))
# if length 1, that date is used for segmenting on the key metric
# if length 2, change in variables between those dates is used on the key metric
# p1Title: title to be used in plot
# p1SubTitle: subtitle to be used in plot
# p1ScaleLabel: label to be used for the color scale in plot
# Check that dateRange is length-1 or length-2
if(!(length(dateRange) %in% c(1, 2))) error("\nMust pass a dateRange vector of length 1 or length 2\n")
# Create database including population, filter by population and relevant dates, remove NA issues
df <- lst[["dfPerCapita"]] %>%
inner_join(select(lst[["countyData"]], countyFIPS, pop), by="countyFIPS") %>%
inner_join(select(lstVax[["vaxFix"]], date, countyFIPS=FIPS, vxcpoppct, vxcgte18pct, vxcgte65pct),
by=c("date", "countyFIPS")
) %>%
filter(pop >= popRange[1],
pop <= popRange[2],
date %in% (max(date)-dateRange),
!is.na(get(baseVar))
) %>%
rowFilter(lstFilter=lstFilter, lstExclude=lstExclude)
# If daterange is of length-2, calculate change in variable
if(length(dateRange)==2) {
if(segVar==baseVar) segVar <- "tempNew"
df <- df %>%
group_by(countyFIPS) %>%
mutate(tempNew=sum(get(baseVar)*(date==max(date)))-sum(get(baseVar)*(date!=max(date)))) %>%
ungroup() %>%
colRenamer(vecRename=c("tempNew"=segVar))
}
# Histogram for key variables
p1 <- df %>%
filter(date==max(date)) %>%
ggplot(aes_string(x=segVar)) +
geom_histogram(fill="lightblue", bins=100) +
labs(x=paste0("Variable: ", segVar),
y=NULL,
title=paste0("# counties by ", segVar, " bucket by time period")
) +
facet_wrap(~date)
print(p1)
# Create buckets for plotting
df <- df %>%
filter(date==max(date)) %>%
mutate(bucket=c(names(cutsLabels), "999. error high")[1+findInterval(get(segVar), unname(cutsLabels))])
# Create county list
tmpList <- lapply(c(names(cutsLabels), "999. error high"),
FUN=function(x) filter(df, bucket==x) %>% pull(countyFIPS)
) %>%
purrr::set_names(c(names(cutsLabels), "999. error high"))
# Create plot of metrics over time
compareBurdenVaxCounty(lstBurden=lst,
lstVax=lstVax,
ctyIDs=tmpList,
returnData=TRUE,
p1Title=p1Title,
p1SubTitle=p1SubTitle,
p1ScaleLabel=p1ScaleLabel
)
}
plotCountyEvolution(cty_newdata_20220507,
lstVax=cty_vaxdata_20220508,
baseVar="tcpm",
lstExclude=list("state"=c("CO", "GA", "HI", "NM", "OH", "SD", "TX", "UT", "VA", "WV")),
popRange=c(100000, 250000),
cutsLabels=c("1. Under 200k"=200000, "2. 200k-300k"=300000, "3. Over 300k"=500000)
)
## # A tibble: 13,014 x 7
## mapName date name value pop n mapName2
## <chr> <date> <chr> <dbl> <dbl> <int> <chr>
## 1 1. Under 200k 2020-01-25 cpm7 0.221 5806894 37 1. Under 200k (n=37)
## 2 1. Under 200k 2020-01-25 dpm7 0 5806894 37 1. Under 200k (n=37)
## 3 1. Under 200k 2020-01-25 tcpm7 1.35 5806894 37 1. Under 200k (n=37)
## 4 1. Under 200k 2020-01-25 tdpm7 0 5806894 37 1. Under 200k (n=37)
## 5 1. Under 200k 2020-01-26 cpm7 0.0492 5806894 37 1. Under 200k (n=37)
## 6 1. Under 200k 2020-01-26 dpm7 0 5806894 37 1. Under 200k (n=37)
## 7 1. Under 200k 2020-01-26 tcpm7 1.40 5806894 37 1. Under 200k (n=37)
## 8 1. Under 200k 2020-01-26 tdpm7 0 5806894 37 1. Under 200k (n=37)
## 9 1. Under 200k 2020-01-27 cpm7 0.0492 5806894 37 1. Under 200k (n=37)
## 10 1. Under 200k 2020-01-27 dpm7 0 5806894 37 1. Under 200k (n=37)
## # ... with 13,004 more rows
plotCountyEvolution(cty_newdata_20220507,
lstVax=cty_vaxdata_20220508,
baseVar="tdpm",
lstExclude=list("state"=c("CO", "GA", "HI", "NM", "OH", "SD", "TX", "UT", "VA", "WV")),
popRange=c(100000, 250000),
cutsLabels=c("1. Under 2000"=2000, "2. 2000-4000"=4000, "3. Over 4000"=10000)
)
## # A tibble: 13,014 x 7
## mapName date name value pop n mapName2
## <chr> <date> <chr> <dbl> <dbl> <int> <chr>
## 1 1. Under 2000 2020-01-25 cpm7 0.142 9051032 57 1. Under 2000 (n=57)
## 2 1. Under 2000 2020-01-25 dpm7 0.0158 9051032 57 1. Under 2000 (n=57)
## 3 1. Under 2000 2020-01-25 tcpm7 0.868 9051032 57 1. Under 2000 (n=57)
## 4 1. Under 2000 2020-01-25 tdpm7 0.110 9051032 57 1. Under 2000 (n=57)
## 5 1. Under 2000 2020-01-26 cpm7 0.0316 9051032 57 1. Under 2000 (n=57)
## 6 1. Under 2000 2020-01-26 dpm7 0 9051032 57 1. Under 2000 (n=57)
## 7 1. Under 2000 2020-01-26 tcpm7 0.900 9051032 57 1. Under 2000 (n=57)
## 8 1. Under 2000 2020-01-26 tdpm7 0.110 9051032 57 1. Under 2000 (n=57)
## 9 1. Under 2000 2020-01-27 cpm7 0.0316 9051032 57 1. Under 2000 (n=57)
## 10 1. Under 2000 2020-01-27 dpm7 0 9051032 57 1. Under 2000 (n=57)
## # ... with 13,004 more rows
plotCountyEvolution(cty_newdata_20220507,
lstVax=cty_vaxdata_20220508,
baseVar="vxcgte65pct",
lstExclude=list("state"=c("CO", "GA", "HI", "NM", "OH", "SD", "TX", "UT", "VA", "WV")),
popRange=c(100000, 250000),
cutsLabels=c("1. Under 80%"=80, "2. 80%-90%"=90, "3. Over 90%"=100)
)
## # A tibble: 13,014 x 7
## mapName date name value pop n mapName2
## <chr> <date> <chr> <dbl> <dbl> <int> <chr>
## 1 1. Under 80% 2020-01-25 cpm7 3.12e- 1 6406205 41 1. Under 80% (n=41)
## 2 1. Under 80% 2020-01-25 dpm7 0 6406205 41 1. Under 80% (n=41)
## 3 1. Under 80% 2020-01-25 tcpm7 2.16e+ 0 6406205 41 1. Under 80% (n=41)
## 4 1. Under 80% 2020-01-25 tdpm7 0 6406205 41 1. Under 80% (n=41)
## 5 1. Under 80% 2020-01-26 cpm7 2.23e- 2 6406205 41 1. Under 80% (n=41)
## 6 1. Under 80% 2020-01-26 dpm7 0 6406205 41 1. Under 80% (n=41)
## 7 1. Under 80% 2020-01-26 tcpm7 2.19e+ 0 6406205 41 1. Under 80% (n=41)
## 8 1. Under 80% 2020-01-26 tdpm7 0 6406205 41 1. Under 80% (n=41)
## 9 1. Under 80% 2020-01-27 cpm7 1.82e-17 6406205 41 1. Under 80% (n=41)
## 10 1. Under 80% 2020-01-27 dpm7 0 6406205 41 1. Under 80% (n=41)
## # ... with 13,004 more rows
plotCountyEvolution(cty_newdata_20220507,
lstVax=cty_vaxdata_20220508,
baseVar="tdpm",
segVar="d_tdpm_360",
lstExclude=list("state"=c("CO", "GA", "HI", "NM", "OH", "SD", "TX", "UT", "VA", "WV")),
popRange=c(100000, 250000),
cutsLabels=c("1. Under 1000"=1000, "2. 1000-2000"=2000, "3. Over 2000"=4000),
dateRange=c(lubridate::days(0), lubridate::days(360)),
p1ScaleLabel="Segmented By: DPM\n(most recent 360 days)",
p1SubTitle=paste0("US counties of 100k-250k (excluding states with spiky data)\n")
)
## # A tibble: 13,014 x 7
## mapName date name value pop n mapName2
## <chr> <date> <chr> <dbl> <dbl> <int> <chr>
## 1 1. Under 1000 2020-01-25 cpm7 0.128 13373433 83 1. Under 1000 (n=83)
## 2 1. Under 1000 2020-01-25 dpm7 0.0107 13373433 83 1. Under 1000 (n=83)
## 3 1. Under 1000 2020-01-25 tcpm7 0.812 13373433 83 1. Under 1000 (n=83)
## 4 1. Under 1000 2020-01-25 tdpm7 0.0748 13373433 83 1. Under 1000 (n=83)
## 5 1. Under 1000 2020-01-26 cpm7 0.0214 13373433 83 1. Under 1000 (n=83)
## 6 1. Under 1000 2020-01-26 dpm7 0 13373433 83 1. Under 1000 (n=83)
## 7 1. Under 1000 2020-01-26 tcpm7 0.833 13373433 83 1. Under 1000 (n=83)
## 8 1. Under 1000 2020-01-26 tdpm7 0.0748 13373433 83 1. Under 1000 (n=83)
## 9 1. Under 1000 2020-01-27 cpm7 0.0214 13373433 83 1. Under 1000 (n=83)
## 10 1. Under 1000 2020-01-27 dpm7 0 13373433 83 1. Under 1000 (n=83)
## # ... with 13,004 more rows
# Pull states from West South Central
wsc <- state.abb[state.division %in% c("West South Central", "East South Central")]
plotCountyEvolution(cty_newdata_20220507,
lstVax=cty_vaxdata_20220508,
baseVar="tdpm",
segVar="d_tdpm_360",
lstFilter = list("state"=wsc),
lstExclude=list("state"=c("CO", "GA", "HI", "NM", "OH", "SD", "TX", "UT", "VA", "WV")),
popRange=c(100000, 250000),
cutsLabels=c("1. Under 1000"=1000, "2. 1000-2000"=2000, "3. Over 2000"=4000),
dateRange=c(lubridate::days(0), lubridate::days(360)),
p1ScaleLabel="Segmented By: DPM\n(most recent 360 days)",
p1SubTitle=paste0("South Central counties of 100k-250k (excluding states with spiky data)\n")
)
## # A tibble: 13,014 x 7
## mapName date name value pop n mapName2
## <chr> <date> <chr> <dbl> <dbl> <int> <chr>
## 1 1. Under 1000 2020-01-25 cpm7 0 848810 4 1. Under 1000 (n=4)
## 2 1. Under 1000 2020-01-25 dpm7 0 848810 4 1. Under 1000 (n=4)
## 3 1. Under 1000 2020-01-25 tcpm7 0 848810 4 1. Under 1000 (n=4)
## 4 1. Under 1000 2020-01-25 tdpm7 0 848810 4 1. Under 1000 (n=4)
## 5 1. Under 1000 2020-01-26 cpm7 0 848810 4 1. Under 1000 (n=4)
## 6 1. Under 1000 2020-01-26 dpm7 0 848810 4 1. Under 1000 (n=4)
## 7 1. Under 1000 2020-01-26 tcpm7 0 848810 4 1. Under 1000 (n=4)
## 8 1. Under 1000 2020-01-26 tdpm7 0 848810 4 1. Under 1000 (n=4)
## 9 1. Under 1000 2020-01-27 cpm7 0 848810 4 1. Under 1000 (n=4)
## 10 1. Under 1000 2020-01-27 dpm7 0 848810 4 1. Under 1000 (n=4)
## # ... with 13,004 more rows
plotCountyEvolution(cty_newdata_20220507,
lstVax=cty_vaxdata_20220508,
baseVar="vxcgte65pct",
lstFilter = list("state"=wsc),
lstExclude=list("state"=c("CO", "GA", "HI", "NM", "OH", "SD", "TX", "UT", "VA", "WV")),
popRange=c(100000, 250000),
cutsLabels=c("1. Under 80%"=80, "2. 80%-90%"=90, "3. Over 90%"=100),
p1ScaleLabel="Segmented By: %vax\n(65+ population)",
p1SubTitle=paste0("South Central counties of 100k-250k (excluding states with spiky data)\n")
)
## # A tibble: 13,014 x 7
## mapName date name value pop n mapName2
## <chr> <date> <chr> <dbl> <dbl> <int> <chr>
## 1 1. Under 80% 2020-01-25 cpm7 0 2567398 17 1. Under 80% (n=17)
## 2 1. Under 80% 2020-01-25 dpm7 0 2567398 17 1. Under 80% (n=17)
## 3 1. Under 80% 2020-01-25 tcpm7 0 2567398 17 1. Under 80% (n=17)
## 4 1. Under 80% 2020-01-25 tdpm7 0 2567398 17 1. Under 80% (n=17)
## 5 1. Under 80% 2020-01-26 cpm7 0 2567398 17 1. Under 80% (n=17)
## 6 1. Under 80% 2020-01-26 dpm7 0 2567398 17 1. Under 80% (n=17)
## 7 1. Under 80% 2020-01-26 tcpm7 0 2567398 17 1. Under 80% (n=17)
## 8 1. Under 80% 2020-01-26 tdpm7 0 2567398 17 1. Under 80% (n=17)
## 9 1. Under 80% 2020-01-27 cpm7 0 2567398 17 1. Under 80% (n=17)
## 10 1. Under 80% 2020-01-27 dpm7 0 2567398 17 1. Under 80% (n=17)
## # ... with 13,004 more rows
Further exploration is made of growth rates in key metrics:
# Create integrated data, using only select time periods
dfVaxBurden_20220507 <- cty_newdata_20220507$dfPerCapita %>%
inner_join(cty_vaxdata_20220508$vaxFix, by=c("countyFIPS"="FIPS", "state", "date")) %>%
filter(date %in% c(max(date)-lubridate::days(c(4, 369))))
# Summarize and create plot
dfPlotVaxBurden <- dfVaxBurden_20220507 %>%
filter(pop >= 25000) %>%
mutate(minday=ifelse(date==min(date), 1, 0), maxday=ifelse(date==max(date), 1, 0)) %>%
group_by(countyFIPS) %>%
filter(n()==2) %>%
summarize(across(c(tcpm7, tdpm7, cpm7, dpm7),
.fns=function(x) sum(x*maxday)/sum(x*minday)-1,
.names="g_{.col}"
),
pop=median(pop),
mu_vxcpoppct=mean(vxcpoppct)
) %>%
pivot_longer(-c(countyFIPS, pop, mu_vxcpoppct)) %>%
filter(value >= -1,
value <= 10
)
dfPlotVaxBurden %>%
ggplot(aes(x=mu_vxcpoppct, y=value)) +
geom_point(aes(size=pop), alpha=0.25) +
geom_smooth(aes(weight=pop), method="lm") +
geom_hline(yintercept=c(-1, 0, 1), lty=2, color="red") +
labs(y="Growth rate from 1 year ago",
x="% population vaccinated",
title="Association of 1-year change in burden with vaccination",
subtitle="Linear model is population weighted for counties >25k population with growth between -1 and 10"
) +
facet_wrap(~name)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 16 rows containing non-finite values (stat_smooth).
## Warning: Removed 16 rows containing missing values (geom_point).
dfPlotVaxBurden %>%
lm(value ~ mu_vxcpoppct:name + name + 0, data=., weights=pop) %>%
summary()
##
## Call:
## lm(formula = value ~ mu_vxcpoppct:name + name + 0, data = .,
## weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -3468.8 -142.7 -12.7 97.9 16322.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## nameg_cpm7 -3.673348 0.156450 -23.479 < 2e-16 ***
## nameg_dpm7 0.426972 0.175463 2.433 0.01499 *
## nameg_tcpm7 1.205051 0.151815 7.938 2.47e-15 ***
## nameg_tdpm7 1.577777 0.151850 10.390 < 2e-16 ***
## mu_vxcpoppct:nameg_cpm7 0.098047 0.003404 28.801 < 2e-16 ***
## mu_vxcpoppct:nameg_dpm7 -0.011104 0.003751 -2.960 0.00309 **
## mu_vxcpoppct:nameg_tcpm7 0.008036 0.003293 2.440 0.01471 *
## mu_vxcpoppct:nameg_tdpm7 -0.019031 0.003294 -5.777 8.03e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 556.3 on 5639 degrees of freedom
## (16 observations deleted due to missingness)
## Multiple R-squared: 0.4357, Adjusted R-squared: 0.4349
## F-statistic: 544.2 on 8 and 5639 DF, p-value: < 2.2e-16
The process is converted to functional form:
# 1. Create integrated data
integrateCountyBurdenVax <- function(lstBurden, lstVax, keyDates=NULL) {
# FUNCTION ARGUMENTS:
# lstBurden: processed county-level burden list containing dfPerCapita
# lstVax: processed county-level vaccines list containing vaxFix
# keyDates: character vector of integers for days to subtract from max(date) and include
# NULL means include all
# Convert keyDates to useDates appropriately
useDates <- sort(intersect(unique(lstBurden[["dfPerCapita"]]$date), unique(lstVax[["vaxFix"]]$date)))
useDates <- as.Date(useDates, origin="1970-01-01")
if(!is.null(keyDates)) useDates <- useDates[useDates %in% (max(useDates)-lubridate::days(keyDates))]
# Create integrated burden data, using requested time periods
lstBurden[["dfPerCapita"]] %>%
filter(date %in% useDates) %>%
inner_join(lstVax[["vaxFix"]], by=c("countyFIPS"="FIPS", "state", "date"))
}
# 2. Create integrated county plot data
makeIntegratedCountyPlotData <- function(df,
burdenVars=c("tcpm7", "tdpm7", "cpm7", "dpm7"),
vaxVars=c("vxcpoppct", "vxcgte18pct", "vxcgte65pct"),
makeBurdenGrowth=TRUE,
fnBurden=mean,
makeVaxGrowth=FALSE,
fnVax=mean
) {
# FUNCTION ARGUMENTS:
# df: processed data frame with integrated burden and vaccines
# burdenVars: burden variables to be summarized
# vaxVars: vaccine variables to be summarized
# makeBurdenGrowth: boolean, should burden be calculated as latest minus earliest?
# fnBurden: function for summarizing burden fields (used only if makeBurdenGrowth is FALSE)
# makeVaxGrowth: boolean, should vaccines be calculated as latest minus earliest?
# fnVax: function for summarizing vaccine fields (used only if makeBurdenGrowth is FALSE)
# Create the prefixes for the burden and vaccines variables
burdenPrefix <- ifelse(isTRUE(makeBurdenGrowth), "g", deparse(substitute(fnBurden)))
vaxPrefix <- ifelse(isTRUE(makeVaxGrowth), "g", deparse(substitute(fnVax)))
# Create delta variables if requested
df %>%
mutate(minday=ifelse(date==min(date), 1, 0), maxday=ifelse(date==max(date), 1, 0)) %>%
group_by(countyFIPS) %>%
summarize(across(all_of(burdenVars),
.fns=if(isTRUE(makeBurdenGrowth)) function(x) sum(x*maxday)/sum(x*minday)-1 else fnBurden,
.names=paste0(burdenPrefix, "_{.col}")
),
pop=median(pop),
across(all_of(vaxVars),
.fns=if(isTRUE(makeVaxGrowth)) function(x) sum(x*maxday)/sum(x*minday)-1 else fnVax,
.names=paste0(vaxPrefix, "_{.col}")
),
hasMaxMin=(sum(maxday)==1 & sum(minday)==1)
) %>%
pivot_longer(-c(countyFIPS))
}
#3. Create plots
plotIntegratedCounty <- function(df,
demVars=c("countyFIPS", "pop", "hasMaxMin"),
vaxVar="mean_vxcpoppct",
burdenVars=c("g_tcpm7", "g_tdpm7", "g_cpm7", "g_dpm7"),
checkMaxMin=TRUE,
popRange=c(1, +Inf),
valueRange=c(-1, 10),
xLabel=NULL,
yLabel="Growth rate from 1 year ago",
plotTitle="Association of 1-year change in burden with vaccination",
plotSubtitle=NULL,
returnData=TRUE
) {
# FUNCTION ARGUMENTS:
# df: data frame from makeIntegratedCountyPlotData()
# demVars: variables for demographics
# vaxVar: x-axis variable (vaccine)
# burdenVars: variables on the facetted y-axes (burden)
# checkMaxMin: boolean, keep only counties with data on the max and min dates (growth meaningless otherwise)
# popRange: acceptable population range for data
# valueRange: range of values to keep for plotting metrics
# xLabel: label for the x-axis (NULL means create from parameters)
# yLabel: label for the y-axis
# plotTitle: title for the plot
# plotSubtitle: subtitle for the plot
# returnData: boolean, should data (df) be returned?
# Create labels where needed
if(is.null(xLabel)) {
xLabel <- paste0("% population ",
if(vaxVar=="mean_vxcgte18pct") "(18+) " else if(vaxVar=="mean_vxcgte65pct") "(65+) " else "",
"vaccinated"
)
}
if(is.null(plotSubtitle)) {
if(min(popRange)<=1 & max(popRange)==+Inf) popLab <- " of any "
else if(min(popRange)<=1 & max(popRange)<+Inf) popLab <- paste0(" <", round(max(popRange)/1000, 1), "k ")
else if(min(popRange)>1 & max(popRange)==Inf) popLab <- paste0(" >", round(min(popRange)/1000, 1), "k ")
else popLab <- paste0(" between ", round(min(popRange)/1000, 1), "k and ", round(max(popRange)/1000, 1), "k ")
if(min(valueRange)==-Inf & max(valueRange)==+Inf) valueLab <- " of any value"
else if(min(valueRange)==-Inf & max(valueRange)<+Inf) valueLab <- paste0(" <", max(valueRange))
else if(min(valueRange)>-Inf & max(valueRange)==Inf) valueLab <- paste0(" >", min(valueRange))
else valueLab <- paste0(" between ", min(valueRange), " and ", max(valueRange))
plotSubtitle <- paste0("Linear model is population weighted for counties",
popLab,
"population with y-axis",
valueLab
)
}
# Create the data frame
df <- df %>%
filter(name %in% c(all_of(demVars), all_of(vaxVar), all_of(burdenVars))) %>%
pivot_wider(c(countyFIPS)) %>%
filter(if(isTRUE(checkMaxMin)) hasMaxMin==1 else TRUE,
pop>=popRange[1],
pop<=popRange[2]
) %>%
pivot_longer(-c(all_of(demVars), all_of(vaxVar))) %>%
filter(value >= valueRange[1], value <= valueRange[2])
# Create and print the plot
p1 <- df %>%
ggplot(aes_string(x=vaxVar, y="value")) +
geom_point(aes(size=pop), alpha=0.25) +
geom_smooth(aes(weight=pop), method="lm") +
geom_hline(yintercept=c(-1, 0, 1), lty=2, color="red") +
labs(y=yLabel,
x=xLabel,
title=plotTitle,
subtitle=plotSubtitle
) +
facet_wrap(~name)
print(p1)
if(isTRUE(returnData)) return(df)
}
#4. Create regressions
regIntegratedCounty <- function(df,
vaxVar="mean_vxcpoppct",
returnData=FALSE
) {
# FUNCTION ARGUMENTS:
# df: data frame from lotIntegratedCounty
# vaxVar: x-axis variable (vaccine)
# returnData: boolean, should data (df) be returned?
df %>%
lm(value ~ get(vaxVar):name + name + 0, data=., weights=pop) %>%
summary() %>%
print()
if(isTRUE(returnData)) return(df)
}
# Example for all dates and for select dates
integrateCountyBurdenVax(cty_newdata_20220507, cty_vaxdata_20220508)
## # A tibble: 1,599,278 x 22
## countyFIPS state date cases new_cases deaths new_deaths tcpm cpm
## <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 01001 AL 2020-12-13 3233 0 41 0 57868. 0
## 2 01003 AL 2020-12-13 10489 0 141 0 46987. 0
## 3 01005 AL 2020-12-13 1264 0 30 0 51203. 0
## 4 01007 AL 2020-12-13 1398 0 39 0 62427. 0
## 5 01009 AL 2020-12-13 3663 0 47 0 63345. 0
## 6 01011 AL 2020-12-13 723 0 20 0 71577. 0
## 7 01013 AL 2020-12-13 1289 0 44 0 66279. 0
## 8 01015 AL 2020-12-13 7658 0 129 0 67409. 0
## 9 01017 AL 2020-12-13 1982 0 55 0 59602. 0
## 10 01019 AL 2020-12-13 1167 0 24 0 44549. 0
## # ... with 1,599,268 more rows, and 13 more variables: tdpm <dbl>, dpm <dbl>,
## # tcpm7 <dbl>, cpm7 <dbl>, tdpm7 <dbl>, dpm7 <dbl>, countyName <chr>,
## # vxcpoppct <dbl>, vxcgte18pct <dbl>, vxcgte65pct <dbl>, pop <dbl>,
## # popgte18 <dbl>, popgte65 <dbl>
integrateCountyBurdenVax(cty_newdata_20220507, cty_vaxdata_20220508, keyDates=c(4, 369))
## # A tibble: 6,284 x 22
## countyFIPS state date cases new_cases deaths new_deaths tcpm cpm
## <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 01001 AL 2021-05-01 6907 3 107 0 123628. 53.7
## 2 01003 AL 2021-05-01 20966 25 306 1 93919. 112.
## 3 01005 AL 2021-05-01 2302 2 56 0 93251. 81.0
## 4 01007 AL 2021-05-01 2596 2 63 0 115924. 89.3
## 5 01009 AL 2021-05-01 6616 3 135 0 114412. 51.9
## 6 01011 AL 2021-05-01 1230 2 41 1 121770. 198.
## 7 01013 AL 2021-05-01 2154 2 69 0 110757. 103.
## 8 01015 AL 2021-05-01 14447 9 312 0 127169. 79.2
## 9 01017 AL 2021-05-01 3544 3 123 1 106574. 90.2
## 10 01019 AL 2021-05-01 1837 1 45 0 70125. 38.2
## # ... with 6,274 more rows, and 13 more variables: tdpm <dbl>, dpm <dbl>,
## # tcpm7 <dbl>, cpm7 <dbl>, tdpm7 <dbl>, dpm7 <dbl>, countyName <chr>,
## # vxcpoppct <dbl>, vxcgte18pct <dbl>, vxcgte65pct <dbl>, pop <dbl>,
## # popgte18 <dbl>, popgte65 <dbl>
# Example for integrated plot data
integrateCountyBurdenVax(cty_newdata_20220507, cty_vaxdata_20220508, keyDates=c(4, 369)) %>%
makeIntegratedCountyPlotData()
## # A tibble: 28,278 x 3
## countyFIPS name value
## <chr> <chr> <dbl>
## 1 01001 g_tcpm7 1.29
## 2 01001 g_tdpm7 1.02
## 3 01001 g_cpm7 -0.143
## 4 01001 g_dpm7 0.4
## 5 01001 pop 55869
## 6 01001 mean_vxcpoppct 30.4
## 7 01001 mean_vxcgte18pct 36.9
## 8 01001 mean_vxcgte65pct 57.3
## 9 01001 hasMaxMin 1
## 10 01003 g_tcpm7 1.66
## # ... with 28,268 more rows
# Check that data are the same
dfCheck1 <- integrateCountyBurdenVax(cty_newdata_20220507, cty_vaxdata_20220508, keyDates=c(4, 369)) %>%
makeIntegratedCountyPlotData() %>%
pivot_wider(countyFIPS) %>%
filter(pop >= 25000, hasMaxMin==1)
dfCheck2 <- dfPlotVaxBurden %>%
pivot_wider(c(countyFIPS, pop, mu_vxcpoppct)) %>%
rename(mean_vxcpoppct=mu_vxcpoppct)
# Confirm same fields
setdiff(names(dfCheck1), names(dfCheck2))
## [1] "mean_vxcgte18pct" "mean_vxcgte65pct" "hasMaxMin"
setdiff(names(dfCheck2), names(dfCheck1))
## character(0)
# Confirm same counties
setdiff(dfCheck1$countyFIPS, dfCheck2$countyFIPS)
## character(0)
# Confirm same value, except for NA
dfCheck <- dfCheck1 %>%
pivot_longer(-c(countyFIPS)) %>%
inner_join(dfCheck2 %>% pivot_longer(-c(countyFIPS)), by=c("countyFIPS", "name")) %>%
mutate(diff=(is.na(value.x) != is.na(value.y)) | (!is.na(value.x) & !is.na(value.y) & value.x != value.y))
# Summary of differences by field
dfCheck %>% count(name, diff) %>% pivot_wider(name, names_from="diff", values_from="n")
## # A tibble: 6 x 3
## name `FALSE` `TRUE`
## <chr> <int> <int>
## 1 g_cpm7 1474 130
## 2 g_dpm7 1158 446
## 3 g_tcpm7 1603 1
## 4 g_tdpm7 1600 4
## 5 mean_vxcpoppct 1604 NA
## 6 pop 1604 NA
# Check that all are where the previous filtering rules produced NA for value.y
dfCheck %>% filter(diff) %>% is.na %>% colSums()
## countyFIPS name value.x value.y diff
## 0 0 0 581 0
# Check values for mismatches
dfCheck %>% filter(diff) %>% mutate(exceeds=(value.x > 10 | value.x < -1)) %>% count(name, exceeds)
## # A tibble: 4 x 3
## name exceeds n
## <chr> <lgl> <int>
## 1 g_cpm7 TRUE 130
## 2 g_dpm7 TRUE 446
## 3 g_tcpm7 TRUE 1
## 4 g_tdpm7 TRUE 4
# Example for integrated plot and regression
integrateCountyBurdenVax(cty_newdata_20220507, cty_vaxdata_20220508, keyDates=c(4, 369)) %>%
makeIntegratedCountyPlotData() %>%
plotIntegratedCounty(popRange=c(25000, +Inf)) %>%
regIntegratedCounty()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 16 rows containing non-finite values (stat_smooth).
## Warning: Removed 16 rows containing missing values (geom_point).
##
## Call:
## lm(formula = value ~ get(vaxVar):name + name + 0, data = ., weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -3468.8 -142.7 -12.7 97.9 16322.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## nameg_cpm7 -3.673348 0.156450 -23.479 < 2e-16 ***
## nameg_dpm7 0.426972 0.175463 2.433 0.01499 *
## nameg_tcpm7 1.205051 0.151815 7.938 2.47e-15 ***
## nameg_tdpm7 1.577777 0.151850 10.390 < 2e-16 ***
## get(vaxVar):nameg_cpm7 0.098047 0.003404 28.801 < 2e-16 ***
## get(vaxVar):nameg_dpm7 -0.011104 0.003751 -2.960 0.00309 **
## get(vaxVar):nameg_tcpm7 0.008036 0.003293 2.440 0.01471 *
## get(vaxVar):nameg_tdpm7 -0.019031 0.003294 -5.777 8.03e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 556.3 on 5639 degrees of freedom
## (16 observations deleted due to missingness)
## Multiple R-squared: 0.4357, Adjusted R-squared: 0.4349
## F-statistic: 544.2 on 8 and 5639 DF, p-value: < 2.2e-16
# Example using 65+ and 25k-500k
integrateCountyBurdenVax(cty_newdata_20220507, cty_vaxdata_20220508, keyDates=c(4, 369)) %>%
makeIntegratedCountyPlotData() %>%
plotIntegratedCounty(popRange=c(25000, 500000), vaxVar = "mean_vxcgte65pct") %>%
regIntegratedCounty(vaxVar = "mean_vxcgte65pct")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 12 rows containing non-finite values (stat_smooth).
## Warning: Removed 12 rows containing missing values (geom_point).
##
## Call:
## lm(formula = value ~ get(vaxVar):name + name + 0, data = ., weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -1061.4 -138.3 -35.7 59.2 5315.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## nameg_cpm7 -1.137473 0.148243 -7.673 2.00e-14 ***
## nameg_dpm7 0.351819 0.180899 1.945 0.051850 .
## nameg_tcpm7 1.154717 0.144110 8.013 1.38e-15 ***
## nameg_tdpm7 1.286266 0.144147 8.923 < 2e-16 ***
## get(vaxVar):nameg_cpm7 0.018417 0.002039 9.031 < 2e-16 ***
## get(vaxVar):nameg_dpm7 -0.006661 0.002468 -2.698 0.006992 **
## get(vaxVar):nameg_tcpm7 0.007065 0.001972 3.582 0.000344 ***
## get(vaxVar):nameg_tdpm7 -0.004668 0.001973 -2.366 0.018018 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 324.1 on 5107 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.5091, Adjusted R-squared: 0.5084
## F-statistic: 662.1 on 8 and 5107 DF, p-value: < 2.2e-16
# Example using 18+ and 25k-500k
integrateCountyBurdenVax(cty_newdata_20220507, cty_vaxdata_20220508, keyDates=c(4, 369)) %>%
makeIntegratedCountyPlotData() %>%
plotIntegratedCounty(popRange=c(25000, 500000), vaxVar = "mean_vxcgte18pct") %>%
regIntegratedCounty(vaxVar = "mean_vxcgte18pct")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 12 rows containing non-finite values (stat_smooth).
## Warning: Removed 12 rows containing missing values (geom_point).
##
## Call:
## lm(formula = value ~ get(vaxVar):name + name + 0, data = ., weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -1116.6 -124.4 -28.4 65.1 5375.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## nameg_cpm7 -2.097971 0.128288 -16.354 < 2e-16 ***
## nameg_dpm7 0.232470 0.156932 1.481 0.138579
## nameg_tcpm7 1.096363 0.124439 8.810 < 2e-16 ***
## nameg_tdpm7 1.396290 0.124521 11.213 < 2e-16 ***
## get(vaxVar):nameg_cpm7 0.045770 0.002522 18.149 < 2e-16 ***
## get(vaxVar):nameg_dpm7 -0.007204 0.003069 -2.348 0.018929 *
## get(vaxVar):nameg_tcpm7 0.011354 0.002442 4.649 3.42e-06 ***
## get(vaxVar):nameg_tdpm7 -0.008938 0.002445 -3.656 0.000259 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 316.3 on 5107 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.5327, Adjusted R-squared: 0.5319
## F-statistic: 727.6 on 8 and 5107 DF, p-value: < 2.2e-16
# Example running multiple vaccine variables
purrr::walk(.x=c("mean_vxcpoppct", "mean_vxcgte18pct", "mean_vxcgte65pct"),
.f=~integrateCountyBurdenVax(cty_newdata_20220507, cty_vaxdata_20220508, keyDates=c(4, 369)) %>%
makeIntegratedCountyPlotData() %>%
plotIntegratedCounty(popRange=c(25000, 500000), vaxVar = .x) %>%
regIntegratedCounty(vaxVar=.x)
)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 12 rows containing non-finite values (stat_smooth).
## Warning: Removed 12 rows containing missing values (geom_point).
##
## Call:
## lm(formula = value ~ get(vaxVar):name + name + 0, data = ., weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -1080.7 -121.2 -25.5 67.7 5391.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## nameg_cpm7 -2.187921 0.119753 -18.270 < 2e-16 ***
## nameg_dpm7 0.126146 0.146677 0.860 0.389813
## nameg_tcpm7 1.044631 0.115677 9.031 < 2e-16 ***
## nameg_tdpm7 1.371267 0.115770 11.845 < 2e-16 ***
## get(vaxVar):nameg_cpm7 0.056844 0.002804 20.273 < 2e-16 ***
## get(vaxVar):nameg_dpm7 -0.006076 0.003420 -1.776 0.075735 .
## get(vaxVar):nameg_tcpm7 0.014801 0.002703 5.476 4.57e-08 ***
## get(vaxVar):nameg_tdpm7 -0.010075 0.002706 -3.723 0.000199 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 313.8 on 5107 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.5401, Adjusted R-squared: 0.5394
## F-statistic: 749.6 on 8 and 5107 DF, p-value: < 2.2e-16
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 12 rows containing non-finite values (stat_smooth).
## Warning: Removed 12 rows containing missing values (geom_point).
##
## Call:
## lm(formula = value ~ get(vaxVar):name + name + 0, data = ., weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -1116.6 -124.4 -28.4 65.1 5375.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## nameg_cpm7 -2.097971 0.128288 -16.354 < 2e-16 ***
## nameg_dpm7 0.232470 0.156932 1.481 0.138579
## nameg_tcpm7 1.096363 0.124439 8.810 < 2e-16 ***
## nameg_tdpm7 1.396290 0.124521 11.213 < 2e-16 ***
## get(vaxVar):nameg_cpm7 0.045770 0.002522 18.149 < 2e-16 ***
## get(vaxVar):nameg_dpm7 -0.007204 0.003069 -2.348 0.018929 *
## get(vaxVar):nameg_tcpm7 0.011354 0.002442 4.649 3.42e-06 ***
## get(vaxVar):nameg_tdpm7 -0.008938 0.002445 -3.656 0.000259 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 316.3 on 5107 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.5327, Adjusted R-squared: 0.5319
## F-statistic: 727.6 on 8 and 5107 DF, p-value: < 2.2e-16
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 12 rows containing non-finite values (stat_smooth).
## Warning: Removed 12 rows containing missing values (geom_point).
##
## Call:
## lm(formula = value ~ get(vaxVar):name + name + 0, data = ., weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -1061.4 -138.3 -35.7 59.2 5315.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## nameg_cpm7 -1.137473 0.148243 -7.673 2.00e-14 ***
## nameg_dpm7 0.351819 0.180899 1.945 0.051850 .
## nameg_tcpm7 1.154717 0.144110 8.013 1.38e-15 ***
## nameg_tdpm7 1.286266 0.144147 8.923 < 2e-16 ***
## get(vaxVar):nameg_cpm7 0.018417 0.002039 9.031 < 2e-16 ***
## get(vaxVar):nameg_dpm7 -0.006661 0.002468 -2.698 0.006992 **
## get(vaxVar):nameg_tcpm7 0.007065 0.001972 3.582 0.000344 ***
## get(vaxVar):nameg_tdpm7 -0.004668 0.001973 -2.366 0.018018 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 324.1 on 5107 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.5091, Adjusted R-squared: 0.5084
## F-statistic: 662.1 on 8 and 5107 DF, p-value: < 2.2e-16
# Example using different time period
integrateCountyBurdenVax(cty_newdata_20220507, cty_vaxdata_20220508, keyDates=c(4, 186)) %>%
makeIntegratedCountyPlotData() %>%
plotIntegratedCounty(popRange=c(25000, 500000),
vaxVar = "mean_vxcgte18pct",
yLabel = "Growth rate (most recent 6 months)",
plotTitle = "Association of 6-month change in burden with vaccination"
) %>%
regIntegratedCounty(vaxVar = "mean_vxcgte18pct")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 14 rows containing non-finite values (stat_smooth).
## Warning: Removed 14 rows containing missing values (geom_point).
##
## Call:
## lm(formula = value ~ get(vaxVar):name + name + 0, data = ., weights = pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -792.2 -78.1 -15.3 36.8 3325.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## nameg_cpm7 -2.1997632 0.1044998 -21.050 < 2e-16 ***
## nameg_dpm7 -0.8992707 0.1140228 -7.887 3.73e-15 ***
## nameg_tcpm7 0.0536133 0.1004151 0.534 0.593
## nameg_tdpm7 0.4373654 0.1004151 4.356 1.35e-05 ***
## get(vaxVar):nameg_cpm7 0.0307740 0.0016130 19.079 < 2e-16 ***
## get(vaxVar):nameg_dpm7 0.0075059 0.0017591 4.267 2.02e-05 ***
## get(vaxVar):nameg_tcpm7 0.0109487 0.0015495 7.066 1.80e-12 ***
## get(vaxVar):nameg_tdpm7 -0.0004757 0.0015495 -0.307 0.759
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 215.4 on 5347 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.3859, Adjusted R-squared: 0.385
## F-statistic: 420 on 8 and 5347 DF, p-value: < 2.2e-16
The latest county-level burden data are downloaded:
urlMapper[["usafCase"]] <- "https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv"
urlMapper[["usafDeath"]] <- "https://static.usafacts.org/public/data/covid-19/covid_deaths_usafacts.csv"
urlMapper[["usafPop"]] <- "https://static.usafacts.org/public/data/covid-19/covid_county_population_usafacts.csv"
readList <- list("usafCase"="./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20220606.csv",
"usafDeath"="./RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20220606.csv"
)
compareList <- list("usafCase"=readFromRDS("cty_newdata_20220507")$dfRaw$usafCase,
"usafDeath"=readFromRDS("cty_newdata_20220507")$dfRaw$usafDeath
)
# Use existing clusters
cty_newdata_20220606 <- readRunUSAFacts(maxDate="2022-06-04",
downloadTo=lapply(readList,
FUN=function(x) if(file.exists(x)) NA else x
),
readFrom=readList,
compareFile=compareList,
writeLog="./RInputFiles/Coronavirus/USAF_NewData_20220606_chk_v005.log",
ovrwriteLog=TRUE,
useClusters=readFromRDS("cty_newdata_20210813")$useClusters,
skipAssessmentPlots=FALSE,
brewPalette="Paired"
)
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## `County Name` = col_character(),
## State = col_character(),
## StateFIPS = col_character()
## )
## i Use `spec()` for the full column specifications.
##
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS
##
##
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date
##
##
## Checking for similarity of: column names
## In reference but not in current:
## In current but not in reference:
##
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 29
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20220606_chk_v005.log
##
## Checking for similarity of: county
## In reference but not in current:
## In current but not in reference:
##
##
## ***Differences of at least 5 and at least 5%
##
## 0 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220606_chk_v005.log
##
##
## ***Differences of at least 0 and at least 0.1%
##
## 7 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220606_chk_v005.log
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## `County Name` = col_character(),
## State = col_character(),
## StateFIPS = col_character()
## )
## i Use `spec()` for the full column specifications.
##
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS
##
##
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date
##
##
## Checking for similarity of: column names
## In reference but not in current:
## In current but not in reference:
##
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 29
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20220606_chk_v005.log
##
## Checking for similarity of: county
## In reference but not in current:
## In current but not in reference:
##
##
## ***Differences of at least 5 and at least 5%
##
## 1 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220606_chk_v005.log
##
##
## ***Differences of at least 0 and at least 0.1%
##
## 11 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220606_chk_v005.log
##
##
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 4
## isType cases new_cases n
## <chr> <dbl> <dbl> <dbl>
## 1 before 2.64e+10 8.23e+7 2758752
## 2 after 2.62e+10 8.17e+7 2714688
## 3 pctchg 5.51e- 3 6.79e-3 0.0160
##
##
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 4
## isType deaths new_deaths n
## <chr> <dbl> <dbl> <dbl>
## 1 before 4.16e+8 998837 2758752
## 2 after 4.05e+8 947217 2714688
## 3 pctchg 2.72e-2 0.0517 0.0160
## NULL
# Plot all counties based on closest cluster
sparseCountyClusterMap(cty_newdata_20220606$useClusters,
caption="Includes only counties with 25k+ population",
brewPalette="viridis"
)
# Save the refreshed file
saveToRDS(cty_newdata_20220606, ovrWriteError=FALSE)